Closed rchl closed 9 months ago
(I don't have any public URL to test with unfortunately)
@rchl From what I understand if you queue a termination the other files are not downloaded. In the logs it tells you this is because of termination
Yes but that is irrelevant to the issue. It's the files that are already downloading that get their speed boosted.
I've set up a test page with couple links to big (200MB) files:
So to test you can do:
httrack https://c8c3dbef-cee9-4765-b7f7-21b5e2ed6d1f.htmlpasta.com/ -O "test-download" -%v --disable-security-limits --max-rate=0 -%e1
The results I see when downloading, before triggering cancel, show around 2MB/s:
Bytes saved: 21,10MiB Links scanned: 2/7 (+1)
Time: 16s Files written: 2
Transfer rate: 1,87MiB/s (1,31MiB/s) Files updated: 3
Active connections: 4 Errors: 0
Current job: waiting (throttle)
receive - ipv4.download.thinkbroadband.com/200MB.zip?1 8,16MiB / 200,00MiB
receive - ipv4.download.thinkbroadband.com/200MB.zip?3 6,01MiB / 200,00MiB
receive - ipv4.download.thinkbroadband.com/200MB.zip?2 6,92MiB / 200,00MiB
request - https://www.google-analytics.com/analytics.js 121B / 8,00KiB
After cancelling (ctrl+c), the speed increases to 5MB/s for already downloading files:
Bytes saved: 90,48MiB Links scanned: 5/11 (+0)
Time: 31s Files written: 3
Transfer rate: 4,90MiB/s (2,91MiB/s) Files updated: 3
Active connections: 3 Errors: 1
Current job: receiving files
receive - ipv4.download.thinkbroadband.com/200MB.zip?1 37,47MiB / 200,00MiB
receive - ipv4.download.thinkbroadband.com/200MB.zip?3 29,44MiB / 200,00MiB
receive - ipv4.download.thinkbroadband.com/200MB.zip?2 23,53MiB / 200,00MiB
To me it seems like there is some queue handling code that bottlenecks downloading speeds.
NOTE: After canceling transfer, you need to wait for at least one file to finish downloading before transfer boost happens. Only then you see Current job
status change from waiting (throttle)
to receiving files
and transfer speeds increase.
@rchl I'm experiencing the same thing. Painfully slow through httrack, about 30KiB/s per socket/open_connection but when I dl a vid file through uget, it's about 3-5MiB/s dl speed. Very frustrating. Just DLing the plain html is also painfully slow, and I think more program runtime is spent in "waiting (throttle)" mode than actually DLing.
I run httrack 3.49-2 on CLI in bash under XFCE on top of OpenSUSE Leap 15.
I'm having issues too. I'm in HK and my own VPS is in Singapore. Pages load in 20ms. I get around 1KiB/s with the following:
"httrack 'https://{$docset->url()}' \
--path 'storage/{$docset->code()}' \
--connection-per-second=50 \
--sockets=80 \
--keep-alive \
--display \
--verbose \
--advanced-progressinfo \
--disable-security-limits \
-s0 \
-o0 \
-F 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36' \
--max-rate=0 \
--depth=5"
wget -r -l --html-extension --convert-links --page-requisites --adjust-extension --load-cookies /Users/user/Library/Application Support/Google/Chrome/Default/Cookies inf **<http/https link>**
Download speeds are seriously bottlenecked by the code somewhere. Using command like:
and downloading let's say 4 big video files at the same time, I'm getting around 400KB/s total download speed while manually downloading those files (with wget or browser) I can easily get over 10MB/s.
It's not due to network or the fact that 4 files are being downloaded at the same time. Why do I think so? Because canceling mirroring with ctrl+c fixes download speeds as soon as one of the file completes.
Steps:
What happens: All 3 remaining files suddenly get "super speed" and finish quickly.
That makes me think that there is some very socket/cpu intensive code that is running while files are being downloaded and makes download rates go to hell. When termination is triggered (I assume something to do with
opt->state.stop
handling), that code probably is no longer running and all download speeds are no longer bottlenecked.