rakshasa / rtorrent

rTorrent BitTorrent client
https://github.com/rakshasa/rtorrent/wiki
GNU General Public License v2.0
4.21k stars 415 forks source link

tracking inaccuracy for some torrents #701

Closed joedefen closed 6 years ago

joedefen commented 6 years ago

I am experiencing tracking inaccuracies with rtorrent 0.9.4 and a private tracker when the (per rutorrent) "Last Updated" time (per rutorrent) is the same for a number of torrents. Torrents become synchronized when rtorrent is restarted and/or a set of torrents is manually started as a group.

Afterthought (4/2/2018): The problem may be correlated with load and bursts, but I think the problem existed before the torrents were synchronized (i.e., the synchronization factor is a red herring).

For synchronized torrents, the tracker only reliably updates the stats of one (likely the first sent) torrent. The tracker sets the "Interval" identically (30m0s) so that once synchronized, always synchronized (although it seems better to randomize the Intervals). The synchronized groups of only a few torrents trigger the problem.

Tracker support claims the tracking inaccuracy is due to frequent disconnects (i.e., it is fault of rtorrent). I suspect that the tracker disconnects when it gets too many updates from one IP (e.g., to protect from overload from a malfunctioning client).

Afterthought (4/2/2018): Probably, tracker support noticed rTorrent "normal behavior" of creating and destroying connections for each update which might be particularly noticeable when updates are synchronized. The following questions are moot. The control, network.http.max_open.set, does address the second, "closed loop" question.

Questions:

From complaints in tracker forums, I think this problem is not uncommon, but it generally goes undiagnosed. I circumvented the problem using pycore, a script and cron; regularly, all torrents are stopped and restarted with 2 second staggering. But, that is a sledgehammer solution, and the general problem festers.

Afterthought (4/2/2018): The above "sledgehammer" was later found sometimes insufficient; sometimes, the ultimate sledgehammer (i.e., restarting rTorrent) is needed to free torrents from bad internal states.

joedefen commented 6 years ago

Followup: I ran more experiments, and logs verify that rTorrent is prone to sending spikes of GET requests for updates. But, in the latest experiments, I could not induce tracking errors, and all GETs were eventually acked.

I suspect the private tracker ceased throttling GET requests (or otherwise corrected tracking) because all other credible complaints about the mis-tracking suddenly stopped, too, when my tracking problems stopped.

Arguably, in a large community, if clients are individually unsociable by sending spikes, the whole community is not spiking together and load should be somewhat distributed. And the servers can choose to randomize intervals if desiring to spread load at the client level. So, calling this a server problem is tenable. I'll close the issue since it generate zero concern or interest.

Afterthought (4/2/2018): These observations discredit synchronization as a factor. There is no evidence the tracker was fixed, and the reports were a in lull and later resumed.

kannibalox commented 6 years ago

Alternatively, you can limit the amount of HTTP request sent at once with network.http.max_open.set. You really don't need that much, less than 100 will serve your needs easily.

joedefen commented 6 years ago

Thanks, that looks like the relevant setting (which I overlooked) to regulate the spikes. The related .rtorrent settings (set by my seedbox provided) throughout have been:

#network.max_open_files.set = 191
#network.max_open_sockets.set = 1536
network.http.max_open.set = 32

The open files ulimit is 1024. The total number of active torrents (all trackers) is normally around 100. During mis-tracking incidents, typical would be 50 active torrents (errant tracker) and a dozen being mis-tracked.

If the problem resurrects, then I'll set "network.http.max_open.set" even lower than 32 and observe the effect; it may need to be much smaller (if any adjustment improves mis-tracking).

Thanks again.

pyroscope commented 6 years ago

if the nofile limit is indeed at the 1024 default, 1536 is a gross misconfiguration… besides the strange ratio between file and net handles.

joedefen commented 6 years ago

Thanks, pyroscope, for your observation. Note that values 191 and 1536 are commented out (I included them to show them as defaulted). From my reading of the code, those values are calculated per ulimit (although the documentation appears silent). Other system info:

$ ulimit -n
1024
$ cat /proc/sys/fs/file-max
807439
$ ps -ef | wc -l
395

I don't think the system limit is exhausted per configuration and use. Perhaps the ulimit of 1024 is? Are you suggesting to explicitly set these parameters? E.g., the docs suggest (when ulimit is 1024):

# To be sure, from the docs, not my config...
network.max_open_sockets.set = 999
network.max_open_files.set = 600
network.http.max_open.set = 99

but it seem to be automatically (as I said) and at first glance, seemingly properly.

If there were problem with hitting the open file limit, would rTorrent record the last update times even if unable to emit the GET requests? (That would indeed explain all the problem symptoms, except the difficulty reproducing it perhaps).

pyroscope commented 6 years ago

What "the" docs? The official wiki template has other values.

joedefen commented 6 years ago

Sorry, I improperly copied the 999/600/99 above from []()https://github.com/rakshasa/rtorrent/wiki/Performance-Tuning

Should have been from []()https://rtorrent-docs.readthedocs.io/en/latest/cookbook.html

# Limits for file handle resources, this is optimized for
# an `ulimit` of 1024 (a common default). You MUST leave
# a ceiling of handles reserved for rTorrent's internal needs!
network.http.max_open.set = 50
network.max_open_files.set = 600
network.max_open_sockets.set = 300

Same question: do the defaults not work?

joedefen commented 6 years ago

Parenthetical musings:

Thus, anecdotally, rTorrent seems more reliable with the optimized settings. I wonder if my seedbox's defaults (32/128/768 respectively, I think, per code inspection) permit rTorrent to paint itself into some corner.

If a viable theory that computed defaults cause problems, then perhaps the defaults could be computed to resemble the recommended, optimized settings? Anyhow, I'll use with the recommended settings, not my seedbox's whim.

pyroscope commented 6 years ago

It's simply right vs. wrong/broken – "optimized" is the wrong word, try "fixed".

joedefen commented 6 years ago

I saw new cases of false hit-and-runs (HnRs); and I'm using the "optimized" rTorrent settings as suggested by 'pyroscope'. When noticed, the false HnRs had seeded over 12 hours, but the tracker credited no more than 0.1 hours. I stopped/started the two HnRs to no avail. I also restarted all the torrents in a staggered manner to no avail. [This destroys my previous theory that the tracker was throttling bursts of updates].

I restarted rTorrent (actually, to turn on tracker logging), and the restart corrected the problem (i.e., the HnRs started collecting stats properly). Perhaps, restarting torrents sometimes kicks them into a better state, but this time, a complete restart of rTorrent was needed.

My current hypothesis is that the mis-tracking is entirely a rTorrent bug. It seems that when a torrent goes into the Seeding state, there is a non-negligible chance that rTorrent stops updating the tracker for that torrent although external indicators suggest all is OK.

Over the years (and recently too), I have noticed rTorrent (0.9.4/0.13.4) occasional problems including:

I could imagine a mutex bug or two (e.g., corrupting lists or maps) as the common cause to these observed defects. As a band-aid, I'll try periodic rTorrent restarts to hopefully correct any "bit rot". [Note: rTorrent does not flush stats to the tracker on exit; so that requires its own work-around.]

I see unfixed, open issues that might be related (e.g., race condition (#515) and numerous crash reports). Can anyone state whether 0.9.6 includes specific fixes that might reduce the problems stated above ? If so, I'll poke my seedbox provider to update.