Open Rid opened 3 years ago
I'm currently testing disabling the OOM killer for the container, I'm not sure how qBittorrent will handle malloc failing, but I'll reply here with the result.
A random question from a non-developer:
My understanding of out-of-memory conditions in Linux, is that a process requesting memory aggressively enough can trigger a kernel panic, even in the presence of the OOM killer if there is also IO activity (disk/network), as IO will often trigger kalloc calls that will kernel panic on fail, and the OOM killer can take a few milliseconds to respond to an oversize process. The trick is to ensure that the Kernel always has enough free memory so kalloc is unlikely to fail. This can be done by setting hard limits on individual process size using RLIMITs - i.e. RLIMIT_AS.
Setting RLIMIT_AS to some value lower than the total amount of memory available to the VM will cause qBittorrent's call to malloc to fail, and trigger whatever error handling qbittorrent has in place to handle that happening (best case it would likely exit immediately) - with a reasonable amount of elbow room (say 1gb), the kernel should be able to clean-up whatever the resulting IO operations are without panicing.
With respect to the issue: how should qbittorrent handle approaching a cgroup limit ? Refusing to add more torrents ? Removing existing torrents ? Exiting immediately ? Ultimately, to do work, qbittorrent needs to allocate memory, it's possible to configure it to use less memory - but it's very hard to predict up-front how much memory a given action will require. At some point QBittorrent would have to say: "I'm too close to a cgroup limit - I refuse to do that", even though the action was probably safe. To reduce the expected behavior to absurdity, one solution is for QBittorrent to refuse to start in the presence of cgroup limits.
@CordySmith in docker cgroups v2 are set using the systemd driver such that they govern all processes within a systemd slice under one set of parameters.
There is no kernel memory controller in cgroups v2, so it's not possible to limit it.
It is possible to set memory.high in cgroups v2 without memory.max which disables the OOM killer and when usage goes over the high boundary, the processes are throttled and put under heavy reclaim pressure, so that could be another possible solution with memory.max used which a high value as a failsafe.
Currently we're moving back to cgroups v1 and disabling the OOM killer which will hopefully solve the issue until we or someone else can make PR in docker.
OOM kill was disabled and the kernel panic still persists:
[74173.465416] usercopy: Kernel memory exposure attempt detected from SLUB object 'zio_buf_comb_16384' (offset 15632, size 17136)!
[74173.465516] ------------[ cut here ]------------
[74173.465518] kernel BUG at mm/usercopy.c:99!
[74173.465555] invalid opcode: 0000 [#1] SMP PTI
[74173.465587] CPU: 0 PID: 1601931 Comm: qbittorrent-nox Kdump: loaded Tainted: P OE 5.8.0-63-generic #71~20.04.1-Ubuntu
[74173.465655] Hardware name: Dell Inc. PowerEdge R730xd/0H21J3, BIOS 2.11.0 11/02/2019
[74173.465707] RIP: 0010:usercopy_abort+0x7b/0x7d
[74173.465738] Code: 4c 0f 45 de 51 4c 89 d1 48 c7 c2 9d 47 7e 85 57 48 c7 c6 e0 f4 7c 85 48 c7 c7 68 48 7e 85 48 0f 45 f2 4c 89 da e8 23 7c ff ff <0f> 0b 4c 89 e1 49 89 d8 44 89 ea 31 f6 48 29 c1 48 c7 c7 df 47 7e
[74173.465864] RSP: 0018:ffffb0e06dce7b50 EFLAGS: 00010246
[74173.465903] RAX: 0000000000000073 RBX: 00000000000042f0 RCX: 0000000000000000
[74173.465957] RDX: 0000000000000000 RSI: ffff8d533f818cd0 RDI: ffff8d533f818cd0
[74173.466006] RBP: ffffb0e06dce7b68 R08: ffff8d533f818cd0 R09: ffffb0e0095d4020
[74173.466048] R10: ffff8d53301cda30 R11: 0000000000000001 R12: ffff8d5fa1ae3d10
[74173.466097] R13: 0000000000000001 R14: ffff8d5fa1ae8000 R15: 0000000000000000
[74173.466147] FS: 00007f66fcff9700(0000) GS:ffff8d533f800000(0000) knlGS:0000000000000000
[74173.466202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[74173.466255] CR2: 00007f66e0fb5000 CR3: 0000000e491d6001 CR4: 00000000001606f0
[74173.466310] Call Trace:
[74173.466336] __check_heap_object+0xe6/0x120
[74173.466370] __check_object_size+0x13f/0x150
[74173.466527] zfs_uiomove_iter+0x61/0xf0 [zfs]
[74173.466649] zfs_uiomove+0x25/0x30 [zfs]
[74173.466766] dmu_read_uio_dnode+0xa5/0xf0 [zfs]
[74173.466887] ? zfs_rangelock_enter_impl+0x271/0x5c0 [zfs]
[74173.466982] dmu_read_uio_dbuf+0x47/0x60 [zfs]
[74173.467105] zfs_read+0x136/0x3a0 [zfs]
[74173.467227] zpl_iter_read+0xd8/0x180 [zfs]
[74173.467268] do_iter_readv_writev+0x18b/0x1b0
[74173.467320] do_iter_read+0xe2/0x1a0
[74173.467349] vfs_readv+0x6e/0xb0
[74173.467377] ? __secure_computing+0x42/0xe0
[74173.469480] do_preadv+0x93/0xd0
[74173.471495] __x64_sys_preadv+0x21/0x30
[74173.473578] do_syscall_64+0x49/0xc0
[74173.475551] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[74173.477889] RIP: 0033:0xdd46ba
[74173.479915] Code: Unable to access opcode bytes at RIP 0xdd4690.
[74173.481937] RSP: 002b:00007f66fcff53b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
[74173.483834] RAX: ffffffffffffffda RBX: 0000000000000938 RCX: 0000000000dd46ba
[74173.485876] RDX: 0000000000000080 RSI: 00007f66fcff53f0 RDI: 0000000000000938
[74173.487623] RBP: 00007f66fcff53f0 R08: 0000000000000000 R09: 0000000000000000
[74173.489277] R10: 00000000005dfb90 R11: 0000000000000246 R12: 0000000000000080
[74173.490939] R13: 00000000005dfb90 R14: 00000000005dfb90 R15: 0000000000000000
[74173.492584] Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 ip6_udp_tunnel udp_tunnel libcurve25519_generic libchacha libblake2s_generic xt_multiport act_mirred cls_u32 sch_ingress sch_hfsc veth nf_conntrack_netlink nfnetlink xfrm_user sch_fq_codel bridge stp llc aufs overlay xt_MASQUERADE xt_nat binfmt_misc xt_addrtype iptable_nat nf_nat ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_state xt_conntrack iptable_filter bpfilter zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl intel_cstate mgag200 drm_kms_helper cec rc_core i2c_algo_bit fb_sys_fops syscopyarea sysfillrect mei_me mxm_wmi dcdbas sysimgblt mei ipmi_si ipmi_devintf ipmi_msghandler mac_hid acpi_power_meter tcp_bbr sch_fq
[74173.492634] nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ifb drm ip_tables x_tables autofs4 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid10 raid1 ses enclosure scsi_transport_sas ixgbe ahci xfrm_algo dca lpc_ich libahci crc32_pclmul tg3 mdio megaraid_sas wmi
It looks like qbittorrent is triggering a bug in the kernel, I'm not sure if this is ZFS specific or not.
Still relevant ?
I'm still having these issues. Yes, still a problem.
I'm also having this issue on 4.6.3, with the memory limit in advanced in qbit set to 1024, the container's memory usage grows well beyond that for the above container, I'm using these memory commmands --memory=10g --memory-reservation=4g
Potentially related, some of my fastresume files have odd sections around pieces and piece priority, for example:
Some of these files were originally being seeded on a windows client, then I moved them to a linux docker container. I wrote a tool to parse the bencoded fastresume file to bulk change the save_path
and qBt-savePath
fields, leaving everything else alone, so I didn't need to manually change the destination in the client and do a recheck
Potentially related, some of my fastresume files have odd sections around pieces and piece priority, for example
Do you have "first and last piece priority" enabled for these torrents?
Potentially related, some of my fastresume files have odd sections around pieces and piece priority, for example
Do you have "first and last piece priority" enabled for these torrents?
Potentially, but through the web-ui I am unable to see that setting on completed torrents.
Bug report
Checklist
Description
qBittorrent info and operating system(s)
If on Linux,
libtorrent-rasterbar
andQt
versionsWhat is the problem
Qbittorrent does not respect cgroup memory limits, resulting in constantly being OOM killed.
Detailed steps to reproduce the problem
What is the expected behavior
qbittorrent to respect cgroup memory limits so as to not be stuck in an endless OOM kill loop.
Extra info (if any)
Kernel logs showing issue:
Attachments