qbittorrent / qBittorrent

qBittorrent BitTorrent client
https://www.qbittorrent.org
Other
27.61k stars 3.91k forks source link

Out of memory killer invoked on RPi4 4GB #18598

Closed piopodg closed 1 year ago

piopodg commented 1 year ago

qBittorrent & operating system versions

qBittorrent: 4.5.1-1 (nox) Operating system: Arch Linux ARM libtorrent-rasterbar: 2.0.8-3 HW: RPi4 4GB with NTFS drive connected via USB3 port Kernel: 6.1.11-2 Swap file: 1GB

What is the problem?

For some unknown reason the following started appearing recently causing crash of my RPI - "mount.ntfs invoked oom-killer". Sometimes you just need to start qbt service again, sometimes you need to do power cycle:

See journalctl log qbt-oom-killer-log.txt

Deluge torrent performs without any problems during the same test.

Steps to reproduce

  1. Start qbittorrent
  2. qbt downloads some torrents (50-80GB each) - crashes randomly 2a. Increase reproduction rate by doing force recheck of one big torrent (~50-70GB)

Additional context

I checked older versions (down to 4.4.5) and they still crashes. Deluge torrent client is doing the same task (rechecking) without crash.

Log(s) & preferences file(s)

No response

glassez commented 1 year ago

Deluge torrent client is doing the same task (rechecking) without crash.

If the Deluge client you tested uses the same version of libtorrent, then most likely there are different (libtorrent) settings that affect.

piopodg commented 1 year ago

Deluge torrent client is doing the same task (rechecking) without crash.

If the Deluge client you tested uses the same version of libtorrent, then most likely there are different (libtorrent) settings that affect.

I found that limiting RAM from 512MiB to 256MiB helps (tested with rechecking). I don't know if this parameter can be changed in Deluge, since this is in qBittorrent section of Options. There is not much of documentation for this parameter.

image

I have no idea why this was not an issue earlier, because libtorrent 2.0 is in use for way over a year now. Maybe kernel has something to do with it.

shama84 commented 1 year ago

My guess is more on the NTFS driver usage under load. At a first glance do you have a good reason to use this format? I own a similar setup and using NTFS cut my perf from 50 to 70% on simple SMB transfer. The main resource is CPU usage but this has adversed effect on memory when I/O comes in play (queuing -->mem usage).

If such is possible for you try to format the disk in native EXT format. If you need to detach it and share it with a Window for example you can try exfat. At some point the NTFS will hurt you from my experience.

piopodg commented 1 year ago

Deluge also crashed during download, so seems there might be a problem with ntfs driver itself. Will check older kernel, but closing this since not qbt related.

csaavedra commented 1 year ago

Actually, that's a premature conclusion. Deluge also uses libtorrent underneath. You need to check if it's using the same version as qbittorrent. I say this because I have exactly the same problem, and I am using an ext4 external USB drive, so I don't think it's NTFS-related.

In my case deluged is using libtorrent 2.0.8.0.

ipaqmaster commented 1 year ago

This isn't explicitly related to a RPi4 setup though may hint to a common cause.

I've also been using qBittorrent (4.5.4) on a KVM guest which accesses the host's media drives over NFS to make its IO over a virtual 10gbe link to the host. The hypervisor is well overspecced for the job, but the guest was seen being targeted by the OOM killer three times this evening after consecutive restarts.

The guest running qBittorrent has 4GB of memory and the fibre downlink here has a limit of ~30MB/s which the qBittorrent guest regularly saturates while flushing to the host's zpool array over NFS.

Around 6PM this evening the qbittorrent service on the guest was hit with the OOM killer a few times. After a few service restarts and an upping of the guest's memory from 4GB to 8GB I decided to ask the host what was wrong.

Dmesg had no useful info and neither did journalctl's general logging... but, the zed (ZFS Event Daemon) service was logging some very obvious class=delay errors against vdev=ata-ST5000LM000-xxxxx-yyyyyyy-part1, one of the eight Seagate 2.5'' ST5000's which make up the media array on this server.

The 8x Seagate 2.5'' ST5000's which make up the host's raidz2 media array unfortunately use Shingled Magnetic Recording and Seagate being early adopters of SMR when these drives were released - they features NO support for the TRIM command. This means the host array slows down significantly when writing to sectors which were previously used but are now being re-written to.

Because of this, I have two PCIe NVMe drives in the PCI slots of this machine to help alleviate this permanent SMR-management problem which sometimes perks up every few months. The addition of these NVMe devices as LOG and CACHE devices in the zpool has worked wonderfully for years to help take on large downloading periods in short time-spans while the 8x SMR drives catch up underneath.

I suspect my qbittorrent OOM experience this evening has been caused by that one drive in the above example class=delay errors as reported by ZED since 6PM due to the drive performing its usual SMR accounting overhead which is what makes SMR drives so unbearable in the first place. This is made even worse because these drives in-particular have no TRIM support for the host to inform them when previously written data can be ignored and considered discarded. As a result, writing to "previously used but free space" is not comminicated to the drive and as the space is re-used (re-written to) they experience AVIO times anywhere from 300ms up toa showstopping 5000ms per IOP as seen in atop.

The aforementioned PCIe NVMe drives can store up to 362.6GiB+362.6GiB of cached reads from the zpool together and their second 10GiB partition's are mirrored providing 10GiB of synchronous writes before everything bogs down. I very recently added many ISOs to the qBittorrent guest to help seed and I suspect the 30MB/s (250mbps) connection quickly filled that 10GiB of zpool log space while this disk was struggling to handle its incoming writes - slowing the entire zpool, log drives, nfs share and torrent VM down.

In fact - it lines up that qBittorrent kept crashing just about every 3 minutes after the initial crash. It had 4GiB of memory (Mostly free) divided by a download of 30MB/s (250mbps) across many torrents. After the host's log devices filled up from waiting on the single slow array member which slowed down the whole pool it kept crashing every 3-4 minutes which is about right for mostly empty 4GB of memory and a download speed of 30MB/s which could not be committed to disk over and over again.

The moment I issued zpool offline zpoolName ata-ST5000LM000-xxxxx-yyyyyyy the entire zpool took an obvious ginormous breath lowering load averages and un-hanging PleX media steams which were also having trouble at the time. qBittorrent also stopped crashing after marking the troublesome SMR disk as offline for a while while it "recovers".


Put short, if anyone runs into this issue no matter what platform - make sure your qBittorrent client isn't literally biting off more data than it can chew into a slower SATA array. This would include slow single drive setups over any protocol (Direct SATA, USB drives) or arrays with a very-very-slow member holding everybody up. I can highly recommend immediately checking atop if you're experiencing qBittorrent crashes and looking for any red DSK lines which may indicate some disk slowness.

In my case it seemed qBittorrent was ballooning itself out accidentally in memory downloading files while it was unable to finish committing synchronous NFS writes to the host due to its suddenly-slowed array causing the write queue to back up, the torrent guest to run out of memory, and promptly killing qBittorrent. I don't know if there's cause for the QBT team to investigate this and try to prevent the application from effectively killing itself by over-downloading while it's unable to commit its writes to storage quickly enough. But it was the answer for me and probably will be for many other visitors. Slow storage causing downloaded data to back-up.