rakshasa / rtorrent

rTorrent BitTorrent client
https://github.com/rakshasa/rtorrent/wiki
GNU General Public License v2.0
4.23k stars 417 forks source link

Crash: epoll called but file descriptor is active #634

Open wlerin opened 7 years ago

wlerin commented 7 years ago

I'm having two possibly related problems with rtorrent. They just started this week after years of continuous use (and it's been some time since I updated anything).

The first and most serious is a crash:

Caught internal_error: PollEPoll::open(...) called but the file descriptor is active
/opt/rtorrent/lib/libtorrent.so.19(_ZN7torrent14internal_error10initializeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x227) [0x7f8e9ae62557]
rtorrent(_ZN7torrent14internal_errorC2EPKc+0x87) [0x456637]
/opt/rtorrent/lib/libtorrent.so.19(_ZN7torrent9PollEPoll4openEPNS_5EventE+0xb3) [0x7f8e9ae764d3]
/opt/rtorrent/lib/libtorrent.so.19(+0xc7b2b) [0x7f8e9aefbb2b]
/opt/rtorrent/lib/libtorrent.so.19(+0x62645) [0x7f8e9ae96645]
/opt/rtorrent/lib/libtorrent.so.19(+0xc592d) [0x7f8e9aef992d]
/opt/rtorrent/lib/libtorrent.so.19(+0xc2b7d) [0x7f8e9aef6b7d]
/opt/rtorrent/lib/libtorrent.so.19(+0xc46f6) [0x7f8e9aef86f6]
/opt/rtorrent/lib/libtorrent.so.19(_ZN7torrent9PollEPoll7performEv+0x14a) [0x7f8e9ae76c4a]
/opt/rtorrent/lib/libtorrent.so.19(_ZN7torrent9PollEPoll7do_pollEli+0x61) [0x7f8e9ae76cf1]
/opt/rtorrent/lib/libtorrent.so.19(_ZN7torrent11thread_base10event_loopEPS0_+0x124) [0x7f8e9aeb1f54]
rtorrent() [0x412474]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f8e99984830]
rtorrent() [0x415839]

This has happened multiple times, with the error always being identical. I've googled this a fair bit but nothing remotely informative comes up.

Second, and as stated possibly related, I initially had network.max_open_files set to 10000, and that was just barely enough. After either the above crash or the one below, however, rtorrent immediately rose to 10000 on opening. So I bumped it up to 15000, and open files rose a little above 10000 before I stopped paying attention.

After crashing again, it immediately rose to 15000. I've now raised it to 50000 and it's sitting comfortably at 16640/50000, but I really don't think I have that many files active (I have about 65 torrents with 60~100 files each, plus ~390 other mostly single-file torrents, so while 10000 was believable, this looks like a bug). Is it possible open file descriptors could be preserved across a crash, so when it reopens they get duplicated?


Finally, the only recent change I can think of that might have triggered this: A few days ago I tried seeding a very large torrent from a mounted cloud drive, and after some hours of working passably I hit quota and this crash occurred:

Caught SIGBUS, dumping stack:
rtorrent() [0x41643b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f651431d390]
/lib/x86_64-linux-gnu/libc.so.6(+0x9f840) [0x7f651372b840]
/opt/rtorrent/lib/libtorrent.so.19(+0x7f402) [0x7f6514bdb402]
/opt/rtorrent/lib/libtorrent.so.19(+0xc8d3d) [0x7f6514c24d3d]
/opt/rtorrent/lib/libtorrent.so.19(+0xce68a) [0x7f6514c2a68a]
/opt/rtorrent/lib/libtorrent.so.19(_ZN7torrent9PollEPoll7performEv+0xd1) [0x7f6514b9ebd1]
/opt/rtorrent/lib/libtorrent.so.19(_ZN7torrent9PollEPoll7do_pollEli+0x61) [0x7f6514b9ecf1]
/opt/rtorrent/lib/libtorrent.so.19(_ZN7torrent11thread_base10event_loopEPS0_+0x124) [0x7f6514bd9f54]
rtorrent() [0x412474]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f65136ac830]
rtorrent() [0x415839]

Error: Success
Signal code '2': Non-existent physical address.
Fault address: 0x7f64f8e342c0
The fault address is not part of any chunk.
Aborted

The above only started happening after this event. I've since removed the offending torrent.


It also bears mentioning that I'm actually using rtorrent-ps rather than straight rtorrent.

pyroscope commented 7 years ago

What makes you think your ulimits even allow that many handles?

wlerin commented 7 years ago

I'm fairly certain they do

$ ulimit -n
100000

It also doesn't seem to be seeding much of anything now, but that could just be temporary.

pyroscope commented 7 years ago

If you start via systemd, that value is not relevant fyi.

wlerin commented 7 years ago

I'm currently not starting it via systemd, but instead running it via byobu to keep it persistent and interactable.