The HTTP chunk size is too small causing a bottleneck in download speeds

mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites

GNU General Public License v2.0

11.7k stars 953 forks source link

The HTTP chunk size is too small causing a bottleneck in download speeds #3143

Closed trodiz closed 1 year ago

trodiz commented 1 year ago

the variable in question: https://github.com/mikf/gallery-dl/blob/0f9dfb7e6298864083b883b8a1f29754ab31f292/gallery_dl/downloader/http.py#L30

When you set a rate limit, eg: -r 3M, instead of getting a download rate closer to that specified value you'll get a much lower value. In my case it was throttled at 1 mega bytes per second. The very small chunk_size creates an overhead in each iteration of the for loop in _receive_rate function. I'd recommend a much more commonly used chunk size of 1,048,576 (1MiB).

mikf commented 1 year ago

Are you sure it's not the download server that's limiting your download speed? I just tested a bit with large Danbooru files and -r/--limit-rate works as expected there.

The chunk size is rather small and there should definitely be an option to configure it, but I don't think that is what's primarily limiting your speeds.

Hrxn commented 1 year ago

Can't say that I've ever noticed gallery-dl bottlenecking my download rate..

The site/server is usually the limiting factor (or a proxy, if used), on many occasions gallery-dl was saturating my download bandwidth just fine..

trodiz commented 1 year ago

Thanks for the reply folks. I am having this issue on all sites that I have tested such as reddit, imgur, gfycat etc. If it is allowed, I can share a link here.

Changing the chunk_size in the code does fix this issue for me. Also, without the rate limit (no -r flag) everything downloads at full-speed as expected. In this case the receive function is called which also uses default chunk_size. So it must be the 100s of iterations inside the _receive_rate function that is causing the slowdown. It has a few math operations and an IO call for writing the progress bar to the console. There is a good chance that all of this noticeable due to my computer being very old and slow.

ps: I'm using the latest version of the library. (gallery-dl==1.23.5)

mikf commented 1 year ago

due to my computer being very old and slow.

That might be the reason for your problem, given how "fast" Python is. The simple receive() only gets used when there is no rate limit and no download progress. Otherwise the more computationally expensive _receive_rate is used.

Anyway, commit https://github.com/mikf/gallery-dl/commit/bca9f965e5f5009f66fb19e52cb0fd9cc054a368 adds a chunk-size option with which you can configure this value however you want without touching any actual code, and it also increases the default chunk size to 2^15 = 32k (from 2^14). 1MiB is far too much as a default value, I think.

trodiz commented 1 year ago

@mikf Thanks for adding the option. Meanwhile I profiled the code and narrowed the issue down to this:

>>> import time
>>> def do():
...     t1 = time.perf_counter()
...     for i in range(1000):
...         time.sleep(0.01)
...     t2 = time.perf_counter()
...     dt = t2 - t1
...     print(f'{dt=}', f'exp={0.01*1000}')
...
>>> do()
dt=15.758051600001636 exp=10.0

Turns out on my machine time.sleep is only accurate up to ~10ms range. So each iteration in the for loop is sleeping a tiny bit more than it needs to. This error accumulates over 100s of loops. A bigger chunk size reduces the iterations and solves the problem. Here's the relevant line from the profiling output: with 16KiB chunk size

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1170   16.095    0.014   16.095    0.014 {built-in method time.sleep}

and with 1MiB chunk size

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       19    5.855    0.308    5.855    0.308 {built-in method time.sleep}

This is specific to my computer and probably not an issue on most modern machines. Once again thanks for your help.

mikf commented 1 year ago

@trodiz I think I found a solution for inaccurate time.sleep() durations, but I'm not sure if it actually works outside my own tests. Could you try the following patch and report back if it works for you even with a small chunk-size?.

t2 = time.time() was meant to counteract an inaccurate time.sleep() duration, but apparently it does the opposite.

diff --git a/gallery_dl/downloader/http.py b/gallery_dl/downloader/http.py
index 2e7e76e6..69dc4813 100644
--- a/gallery_dl/downloader/http.py
+++ b/gallery_dl/downloader/http.py
@@ -301,7 +301,7 @@ class HttpDownloader(DownloaderBase):
                 if elapsed < expected:
                     # sleep if less time elapsed than expected
                     time.sleep(expected - elapsed)
-                    t2 = time.time()
+                    t2 += expected - elapsed

             t1 = t2

trodiz commented 1 year ago

Although it eliminated the inaccuracy in time.time, it still did not compensate for the error in time.sleep. Any sleep interval below 15ms is not possible on Windows 10 unless I adjust the time resolution manually.

Here's an example of a file I'm trying to download:

FILE SIZE bytes	CHUNK SIZE bytes	LOOPS	LIMITED RATE bytes/sec	UNLIMITED RATE bytes/sec	SLEEP PER LOOP milli sec
16,777,216	32,768	512	3,145,728	7,340,032	5.952

A 16MiB file should take 5.33s to finish at a rate of 3MiBps
At full speed it will finish in 2.29s (7MiBps is the max speed I can get)
Therefore, an additional 3.05s needs to be slept, or 5.95ms per loop

Here's a solution that should work on all Windows machines:

import ctypes
import platform

if platform.system() == "Windows":
    ntdll = ctypes.WinDLL('NTDLL.DLL')
    ntdll.NtSetTimerResolution(0)

This improves the time resolution to 1ms or lower and everything will run as expected.