rakshasa / rtorrent

rTorrent BitTorrent client
https://github.com/rakshasa/rtorrent/wiki
GNU General Public License v2.0
4.05k stars 412 forks source link

Periodic peer disconnect and idling data transfer intervals #1261

Open kirillsc opened 4 months ago

kirillsc commented 4 months ago

Hi All,

I am observing periodic interruptions in data transfer among peers that are using rtorrent. I am not sure if this is a bug, but I would appreciate if you could help me to understand the issue.

I am using rtorrent for file distribution within private network, specifically it is private subnets of a VPC on AWS. I run my own opensource bittorrent-tracker (https://github.com/webtorrent/bittorrent-tracker). My use case comprised of a single seeding server that needs to distribute a ~50GB folder among ~300 machines within the same private network. The seed server (1) creates the torrent file based on the local folder (2) shares this torrent file among all peers (3) each peer adds torrent file into the /rtorrent/watch/start/ folder and connects to the same tracker server (within the same network). All ~300 peers initiate downloading in about the same time.

The problem that I am repeatedly observing is prolonged periods of complete idling among all peers in the network. In other words, for the first ~5 minutes everything works as expected i.e., the seed server uploads at its maximum network bandwidth and all the peers receive pieces and also distribute chunks among themselves. Then after the initial ~5 minutes all peers disconnect from each other and idle for 5-10 minutes, eventually transfer resumes and lasts for another few minutes just to disconnect again. Two important observations: (1) if at the time of such idling, I manually restart the source seed server all transfer resumes for another few minutes, (2) the problem does not appear to be related just to the seed server only, because during the initial period of data transfer (before the first idling) there are many peers that manage to get the complete torrent file, however, none of them are sharing the data with the remaining peers during the idling period.

I am attaching log files from the seed server and one of the peer servers from one of my smaller scale experiments where I used only 20 peers.

In the log file I am seeing the following messages, but I am not sure about their relevance or how to debug them further. 



Handshake dropped: seeder rejected.
Received error: message:7 network error.
Upload unchoked slots adjust; currently:10 adjust:1

I am using rTorrent v0.9.8 and RHL8 OS.

I would appreciate any guidance on what could be an issue here.

Thank you. server-log.log client-log.log

kannibalox commented 4 months ago

Did everything work as expected in the smaller test? If not, would you happen to have a log from a peer that didn't successfully get past the stall? Can you share your config?

Just to break down the log messages you mentioned a bit:

One funky thing I see in the logs that I don't think is normal is that within the space of second, rtorrent is starting an outgoing connection, receiving an incoming connection from the same host, then declaring that both connections received a network error. It's possible there's some weird race condition that happens in low latency networks. I assume all the clients are currently receiving the torrent at essentially the same time, would it be possible to try staggering the start across the servers?

kirillsc commented 4 months ago

Hi @kannibalox

Thanks for the quick response and explanation of the messages!

I was able to reproduce this issue using a single seeding server and a single client. I am attaching both logs and the configuration that was used. In this experiment the client experienced a stall in less than a minute after starting downloading the file.

To answer your last question, I am already spreading start up times across 20 seconds interval, however, the objective is to distribute files as fast as possible. I can artificially slow the process further (say by 1-2 minutes), but the issue is still present in the smallest scale tests.

Also, I don't want to diverge this conversation from the original topic, but I have also observed several times a case when a client shuts down half way through downloading a file. I have observed this when rtorrent client has been launched as a detached daemon process. I am attaching this log file as client_error2.log just in case it will make sense to you.

Let me know if I can provide any other debug information.

Thank you. seed_server1.log client1.log config.log client_error2.log

kannibalox commented 4 months ago

Hm, 20 seconds would be enough to prevent the behavior I was thinking of, and there's not anything else obvious in the 1-on-1 logs. My interest is sufficiently piqued that I may see if I can replicate it. Are there any other noteworthy details about your setup?

As for client_error2.log, that looks like a normal shutdown procedure. Those can be triggered by SIGINT or SIGHUP, or by RPC calls, see https://rtorrent-docs.readthedocs.io/en/latest/cmd-ref.html#term-system-shutdown-normal for more infortmation. If rtorrent encountered an error it couldn't handle or something, it would have just crashed hard instead.