zlatinb / muwire

MuWire file sharing client for I2P
GNU General Public License v3.0
189 stars 28 forks source link

No files will download if too many of them are being requested at once from the same source #90

Open Searinox opened 2 years ago

Searinox commented 2 years ago

I am not sure what the exact threshold for this is but 100 file downloads is a good margin for reproducing it. Time until retry was set to 30 seconds.

When too many files are requested from the same source, this eventually results in no files being downloaded. This is best tested with relatively small (1MB max.) files. As all downloads are started, a few will initially start but after ~5min a state will be reached where no downloads are occurring anymore.

Manually narrowing this down by pausing all downloads and allowing only batches of about 20 files to download at one time restores functionality. The more files are active, the more time is spent competing for a download slot in between attempts and once this activity exceeds the set download retry interval they begin attempting anew and throw any progress made by other instances away as they then fail and un turn go into the retry loop.

My personal idea on resolving this is:

-In addition to enforcing incoming connections to themselves, MuWire instances should also advertise the "maximum concurrent downloads per user" setting, for example when a new download is requested.

-Download sources should be a type of object and stored in common list to which new sources are added whenever at least one DownloadManager needs them and removed whenever the last DownloadManager using them exits.

-When a DownloadManager attempts to connect to a source, it should first look to see if this source already exists.

-All active DownloadManagers intending to use a source must make a call to the source object. Here the object, based on how many active downloads are known to already be running on it as well as the known source limit, will either give the go-ahead to the DownloadManager and add it to the total of running managers, or tell it to wait in line and try making the call again later.

In essence the download source object - armed with information about how many users are already downloading from it - would behave like a traffic cop in allowing or denying DownloadManagers to queue up.

By no means is the above intended to be authoritative, I simply gave my own opinion on the matter since I understand resolving this won't be an easy task.

zlatinb commented 2 years ago

There are a few aspects to this issue, and it is worth looking into possible I2P-layer problems as well. By default there is no limit to the uploads slots per user, so if this behavior still happens in that case, then there is a problem establishing I2P connections.

Another thing I've observed is downloads do connect and start fetching the hash list, but then fail. The hashlist is always provided, and upload slot availability is checked after that. This could indicate that the uploader has set a limit on the upload slots, but it needs to be verified.

If we identify that indeed it is the upload slot limit that is causing the problems, then there are several solutions which can be considered. I'm not going to go into them yet because I want to understand exactly what is causing this issue. Adding the following line to logging.properties will give us some more insight on what is going on:

com.muwire.core.download.level=FINE

I'm going to do some experimenting and will post an update once I know more.

Searinox commented 2 years ago

The load balancing would run into more than just issues managing a known download limit. Even if a sensible limit was put in place it could be that for whatever reason the network connection at that time isn't fully up to task with what the client can actually handle and still end up failing even with few downloads. The logic would also need the ability to assess repeated download failures in the long run and try dynamically adjusting the target up and down depending on how the host is responding.

zlatinb commented 2 years ago

I was able to reproduce this with a node under my control that does not have upload slot limit. It is an I2P issue, more specifically the I2P layer limits the size of outstanding SYN packets to 64, which means that if more than 64 downloads get attempted simultaneously the queue will be clogged up.

What makes the problem worse is that the downloading node will keep retrying the downloads hence retrying the reconnect until it saturates the uploading node with SYN packets.

I'm going to experiment with delaying multiple download attempts by some small value like 100ms...

Searinox commented 2 years ago

If there is a connection limit in I2P wouldn't it also make sense to also limit the number of simultaneous downloads a user can allow in MuWire's UI to it? Or at least, have the MuWire connection setting propagate to I2P's config as well.

zlatinb commented 2 years ago

By default there is no limit to the number of simultaneous connections after they've been established; the limit is on the number of connections in the process of establishment.

In build 90 I've throttled new downloads to 10/second. I've tested this with ~2000 downloads from the same host; it worked much better although it still overwhelmed the host. I then tested 300 downloads and those worked fine.

I don't know what the proper solution to this should be. Please try https://muwire.com/downloads/MuWire-0.8.9-GitHub90.zip and let me know how it behaves.

Searinox commented 2 years ago

Okay so, after some minutes I can confirm that downloads are indeed making it through, at least to some extent. What I did notice is that when I was manually doing my downloads in batches of 20 each, you'd get 1-4 files that start downloading and continue consistently up until they finish. In this new build any one download I try to follow gets significantly less "air time" as they are interrupted before long for others to continue. There is definitely a degradation of speed resulting from all this jumping around from one download to another but yes it is at least actually downloading instead of doing nothing at all whatsoever, which is what was happening before.

zlatinb commented 2 years ago

I've had an item on my todo list to implement download queue with priorities. Having something like that would make it possible to limit the number of simultaneous downloads per host.

Another solution is to implement multiplexing and to use a single I2P-layer connection for multiple files. This would require protocol changes.

Another very simple solution is to limit the number of connections that can be in the process of establishment to any given host at any given time. That can be done with a Semaphore per I2P destination. That's what I've done in build 90-2 and it works much better in my testing - I was able to download 2200 files in one go without overwhelming the uploader.

https://muwire.com/downloads/MuWire-0.8.9-GitHub90-2.zip

Searinox commented 2 years ago

Download consistency is definitely improved.

I did however have an issue where only one file was "Downloading" except it was stuck in that state for several minutes until I decided to pause it and then resume it. Eventually, everything resumed including that file. It was in the earlier minutes after connecting back to the network and may have been hampered by tunnels still forming, I don't know, but just in case maybe take a look there to see if there's anything in the downloading state with the new logic that's at risk of deadlocking. I'll let you know if it happens again.

I understand this is not an easy issue to fix and requires much code overhaul. I also understand that a release is coming soon. Unless you can/want/do come up with any new updates to this, I think is acceptable in its current form performance-wise. The issue could remain open pending a final implementation and any potential new bugs.

As a final thing I'd like to point out, speed counters for the downloads as well as the bottom total often report exaggerated speeds often going into the MB/s, even into the tens. This is something that seems to get worse the more concurrent downloads there are, but still a minor issue.

Again the download performance is quite okay right now.

Searinox commented 2 years ago

So I stopped sorting files by download speed and instead sorted them by name which results in virtually no shuffling in the downloads list while they're ongoing and speeds look normal everywhere again. My guess is then that time delays created by GUI updates are causing the buggy speed values.

zlatinb commented 2 years ago

Regarding download speed counter - yes they're not very reliable even in the best of times. I haven't quite figured out how to improve them. Right now the more regularly they are read the more accurate they are. If they're read at irregular intervals then they can show crazy values. That explains what you describe with sorting the downloads differently.

I've made what I believe are the final tweaks I'll do regarding this issue for this release cycle in 90-3. It basically reverts the complicated first change to throttle starting of downloads and leaves the semaphore limit on the connection attempts per destination. https://muwire.com/downloads/MuWire-0.8.9-GitHub90-3.zip

Please keep an eye on downloads that behave weirdly. I saw a few which were 100% complete yet kept trying to reconnect with the 90-2 version.

Searinox commented 2 years ago

Before I try this I am also going to report: as a result of a multi-upload attempt from my node I lost connection to all of my ongoing downloads. Anyway, everything is now stuck in a download/connecting/failed loop with no work being done anymore. Even though those transfers have ended for some minutes now. How are uploads affecting this?

zlatinb commented 2 years ago

There is nothing at the MuWire layer that mixes downloads and uploads; but at the I2P streaming layer the receiver of the data needs to send ACK packets to the sender of the data. If the uploads saturate the available bandwidth then it increases the probability the ACKs could get lost.

Note that as far as clearnet connections, you rely on the number of tunnels which you have configured MuWire to use. By default that number is 4 with maximum of 6. If you're uploading to someone and downloading from someone else via a different tunnel you should have little to no problems. But if the two I2P streaming connection happens over the same tunnel, then it's very easy to overload it.

That doesn't quite explain a connecting/failed loop, nor does it explain a connecting/downloading/failed loop. Unless I manage to reproduce this case I'll have to point the finger at the I2P streaming layer.

Searinox commented 2 years ago

The person that caused this would likely be able to do it again on my node by performing the same action they did last time.

Searinox commented 2 years ago

Yep, there go my downloads again! Let's see if they ever recover on their own...

Searinox commented 2 years ago

They seemed to recover for a moment, something I also saw last time, but it was extremely weak, 1-2 KB for a few seconds then renewed silence. Then suddenly there was another surge and it was working as normal for about a minute. Now it's all dead again. My guess is the bandwidth is there but it can't get a spot to do some thing or another. I am going to be limiting the number of upload slots per user to 4 so there should be a big difference shortly...

EDIT: I am progressively raising the limit per user to see when it starts to choke. Refresh here to see where I'm at right now.

Note: It appears that whenever I change that limit, all uploads temporarily die out and then revive slowly.

CURRENTLY: 96

Update: As I went from 16 to 32 I significantly lost download speed. Up to that point the loss had been none to minimal. Update: 64 took it down to 10% of the usual speed, downloads barely moving.

EDIT: Test done.

Searinox commented 2 years ago

At 96 download slots the speed for the download is half of what is normal, significantly higher than the 10% I experienced with 64. At this point I am not sure what this means for the degradation of speed since clearly there is a lot of variation based on number of active tunnels, the quality of those tunnels, how faraway these tunnels link to etc..

I reduced it back to 1 and I see the speeds being mostly the same. I am tempted to write this off as poor quality observation, tainted by many unpredictable variables inherent to I2P's infrastructure.

As I sit here with only 1 slot open I notice 4 simultaneous uploads and I'm confused as to why. Slowly other instances are vanishing so I guess this transition is gradual. EDIT: Okay it was gradual, it's now down to 1.

I think I've gotten an idea of some healthy margins for upload slots though. One per user actually seems the best speed-wise. As I noted simultaneous multi-download does add some overhead and I'm seeing the upload speed has essentially doubled. At the same time it leaves my downloads much more stable.

The takeaway for me here is that downloading files one at a time from other users seems to be the fastest. I wish there was a way to configure this limit per user, although "download sequentially" does exist. But once you've committed to (not) using it you can't change that anymore.

One last thing: I saw it's possible to set a Total Upload Slots > Upload Slots Per User. I understand settings validation is considered minor and subject to a later update, but what does the client do then?

zlatinb commented 2 years ago

Interesting test, I hadn't done such test myself so this is new to me.

You're basically right that downloading one file at a time from a user will be the fastest. The download sequentially checkbox does something else though - it makes sure the pieces of a file get requested in order. This helps with the "Preview" function for some media types.

The total upload slots means total slots for the entire instance across all users. MuWire checks both values when deciding whether to allow an upload request to proceed. Note that it does that for every piece that is requested, so there may be multiple uploads going but only certain number of pieces will be allowed at a time. If the user decides to configure total upload slots to be less than upload slots per user, then the smaller of the two values matters.

Searinox commented 2 years ago

Is there a possibility of a Max Simultaneous Downloads Per User in the future? From a performance standpoint it still makes sense to be downloading multiple files at once as long as they from different users.

zlatinb commented 2 years ago

I will need to implement a download queue before a max-downloads-per-user can be coded. A download queue is going to have locally queued downloads, which will be different from remotely queued downloads. At LimeWire we had both local and remote queues and I think it worked quite well. But it depends a lot on the nature of the traffic; eMule used to have remote queues that were thousands of slots long and it sometimes took days for a download to begin. (This was more than 10 years ago, last time I used it).

Searinox commented 2 years ago

Sounds sticky. Thanks for helping out, I'll leave this issue as it is for you to decide what comes next, unless I find more related bugs.