novoda / download-manager

A library that handles long-running downloads, handling the network interactions and retrying downloads automatically after failures
Apache License 2.0
483 stars 63 forks source link

Is there any possibility to bypass the initial head request ? #491

Open mspmax opened 5 years ago

mspmax commented 5 years ago

Hi Guys, We are facing an issue at the moment as we do high volume downloads (eg: 100 or more per batch) so it takes a considerable amount of time to start downloading. This is mainly due to the fact that head requests are done sequentially. So is there a way to improve this or basically a way to provide the content length before downloading? and to add to this I noticed that it does another HEAD request on every download(parallel item) completion, any reason for that?

Cheers !

zegnus commented 5 years ago

Hi @mspmax the problem with passing the length beforehand is that there is no guarantee that the values that you provide are going to be the ones reported by the endpoint causing corrupted downloads, I guess that this could be a risk to consider if we add this value.

In terms of the performance issue, parallel downloads are not currently supported and it would require a big change in the library. Also I am not 100% sure that enabling parallel downloads will speed up the batch because if we assume that one download takes the maximum bandwidth available and that the endpoint is capable of fulfilling this bandwidth, then parallel downloads won't improve much, but it will have a big hit on memory and resources usage of the library.

If you can give us an example where we can load stress the library we will try to think of easy ways to speed it up.

mspmax commented 5 years ago

Hi @zegnus, thanks for the prompt reply. I'm bit confused here so when we add few items to a Batch it means those are not done in parallel but sequential? but still, there can be any number of items per batch right?

zegnus commented 5 years ago

@mspmax you are correct. Everything is sequential and you can add any number of items per batch and any number of batches; the library will process them one by one until everything is finished and there is no performance impact on the number of items to download in terms of network calls, also you won't overload an endpoint with 100s of opened connections.

mspmax commented 5 years ago

Thanks, but the main issue is around the head requests as it takes a considerable amount of time. Basically, for 100 list of items in the batch it sends 100 head requests before starting to download.

Mecharyry commented 5 years ago

does another HEAD request on every download(parallel item) completion

That's interesting, I'll take a look at that particular problem and open another issue to represent it.

100 list of items in the batch it sends 100 head requests before starting to download.

The issue here is not with the actual download @zegnus it's with the smaller requests that we do at the beginning that could be done in parallel. We could probably do this or allow the clients to specify the size initially and we will trust what they tell us, if it is incorrect from that point then the downloads will fail.

mspmax commented 5 years ago

Exactly what I'm referring to @Mecharyry. Because we are getting the size from the backend as metadata so we can avoid the HEAD request overhead. In this specific case, all the small chunks are equally sized. If we can have an option to specify the size explicitly that will be really helpful as we are currently a bit stuck with a production release because of this issue.

Thanks in advance!

zegnus commented 5 years ago

@Mecharyry as a quick fix I rather add a building option for specifying the size of each file than rushing on a multithreaded solution for the file request sizes

mspmax commented 5 years ago

yes agree with @zegnus cause a multithreaded solution will hinder the servers as well.

Mecharyry commented 5 years ago

💥 https://github.com/novoda/download-manager/pull/492

This will need to be reviewed by some others and then we can get this on a snapshot for you to try @mspmax. I need to do some CI work before that however. Hopefully released by the end of the week as a full version.

mspmax commented 5 years ago

@Mecharyry you're the best ! 👍 👍

Mecharyry commented 5 years ago

Hey @mspmax sorry for the delay shipping this, I'm currently waiting for someone to get back from holiday to help me out with the CI for this project.

mspmax commented 5 years ago

@Mecharyry no worries! and thanks for letting me know

mspmax commented 5 years ago

btw @Mecharyry should the provided file size be exactly same as the actual one ? cause sometimes we might do an estimation as well

Mecharyry commented 5 years ago

@mspmax it should be exactly the same, I think there are checks that will fail otherwise during a download.

mspmax commented 5 years ago

Hi @Mecharyry any updates on this? Thanks

Mecharyry commented 5 years ago

Hi @mspmax, unfortunately we are having some issues with our bintray accounts where we are hosting these releases. We've been trying hard to get them resolved but so far we haven't had a lot of luck. I'll try and keep you updated as I hear more. Sorry for the delay.