rockdaboot / wget2

The successor of GNU Wget. Contributions preferred at https://gitlab.com/gnuwget/wget2. But accepted here as well 😍
GNU Lesser General Public License v3.0
567 stars 76 forks source link

Incomplete Download #245

Open HtnrxUNc opened 3 years ago

HtnrxUNc commented 3 years ago

Hello. I use "wget2" to download directories with many large files (usually 3GB). While downloading, I see some strange message (just got 29918780 out of 3728234580 bytes), which I can easily notice because it is red and not related to the progress display. When "wget2" showed that the download task was complete, I checked the download directory and the source directory and found that some of the files were not the same size, so I deleted the directory and executed the download again. My OS: ubuntu 20 Using "wget2" version 2.0.0 I can't predict and reproduce the problem now, but I'm still downloading another directory and I think I might encounter this problem when I download later, so can I output the log to a file with "wget2 -o log.log"? I understand that this information is not enough to find the cause of the problem, so I need to provide more information. I'm putting this issue (idle) up for now, and will update again if I get more information.

HtnrxUNc commented 3 years ago

By the way, does "wget2" have the ability to check if the source directory (or file) is the same as the download directory (or file)?

HtnrxUNc commented 3 years ago

Update I ran into this problem again, I saved the logs output from the window (unfortunately, the logs I output to the file with "wget2 -o log.log" are basically garbled (wrong encoding) and I don't know how to output the logs correctly. The logs are as follows (there may be weird line breaks, but this should be a problem with my "ssh" software)

Failed to make directory '127.0.0.1/1 1 1 1 /Python' (errno=5) Failed to open '127.0.0.1/1 1 1 1 /Python/index.html' (2) Failed to make directory '127.0.0.1/1 1 1 1 /Python/Python 3 Deep Dive (Part Failed to open '127.0.0.1/1 1 1 1 /Python/Python 3 Deep Dive (Part 2 - Iterat Failed to make directory '127.0.0.1/1 1 1 1 /Python/Python 3 Deep Dive (Part Failed to open '127.0.0.1/1 1 1 1 /Python/Python 3 Deep Dive (Part 4 - OOP) 2 Just got 121596333 of 231858443 bytes Just got 60763527 of 76225239 bytes

HtnrxUNc commented 3 years ago

I suspect this may be related to the fact that I am downloading the directory directly to "google drive" via "rclone", probably because "google" limits the frequency of its "google dirve api". I will try to download the same directory again later by using the locally mounted storage as the download directory to see if the problem will occur again.

rockdaboot commented 3 years ago

To get an ASCII encoding for the output, LC_ALL=C wget2 ... should do it. The log output is also better readable if you limit wget2 to one thread, with --num-threads=1. Also add --debug (or -d) to the command line.

By the way, does "wget2" have the ability to check if the source directory (or file) is the same as the download directory (or file)?

If you do an incremental download (-c), also use -N. Though, wget2 can only do an heuristic because it has to rely on the HTTP protocol and on the server behaving correctly.

Just got 60763527 of 76225239 bytes

This happens when the server says "expect 76225239 bytes to come" but the download stops after 60763527 bytes. This could indicate a flaky connection.

Failed to make directory '127.0.0.1/1 1 1 1 /Python' (errno=5)

Errno 5 is an "I/O error". This should not happen - something might be wrong on your side. If you store the files in a network fiel system (NFS, CIFS, ...), check your network. If you store on a local file system, do a file system check (fsck). (These are just wild guesses from me.)

HtnrxUNc commented 3 years ago

To get an ASCII encoding for the output, LC_ALL=C wget2 ... should do it. The log output is also better readable if you limit wget2 to one thread, with --num-threads=1. Also add --debug (or -d) to the command line.

By the way, does "wget2" have the ability to check if the source directory (or file) is the same as the download directory (or file)?

If you do an incremental download (-c), also use -N. Though, wget2 can only do an heuristic because it has to rely on the HTTP protocol and on the server behaving correctly.

Just got 60763527 of 76225239 bytes

This happens when the server says "expect 76225239 bytes to come" but the download stops after 60763527 bytes. This could indicate a flaky connection.

Failed to make directory '127.0.0.1/1 1 1 1 /Python' (errno=5)

Errno 5 is an "I/O error". This should not happen - something might be wrong on your side. If you store the files in a network fiel system (NFS, CIFS, ...), check your network. If you store on a local file system, do a file system check (fsck). (These are just wild guesses from me.)

OK, thank you very much for your reply. I am currently already using "wget2 -m -np -t 0 -c -d --num-threads=1 -o log.log http://a.com:80/b/c/" Download the source address directory, I will let him download it in the background and the window is outputting the logs quite normally. (But I found that my local directory file(log.log) is still 0B, and I didn't find any characters when I opened it, I'm not sure if this is correct, because he is in "-d" mode) I think your answer might be exactly why my download failed, so I'll post the logs later if I meet the error. I'll idle (hang) the question for now, and if later I'm sure it's because of a problem with the file storage system, I'll explain why and close the issue. If I find out later that this may be a bug, I will get back to you. Thank you for your participation and patience. :)

rockdaboot commented 3 years ago

wget2 -m -np -t 0 -c -d --num-threads=1 -o log.log http://a.com:80/b/c/

Oh sorry. It must be --max-threads=1 instead of --num-threads=1.

HtnrxUNc commented 3 years ago

wget2 -m -np -t 0 -c -d --num-threads=1 -o log.log http://a.com:80/b/c/

Oh sorry. It must be --max-threads=1 instead of --num-threads=1.

Thank you for the tip. I tried to stop the previously executed deprecated command

wget2 -m -np -t 0 -c -d --num-threads=1 -o log.log http://a.com:80/b/c/.

Then the new command was executed after the modification

wget2 -m -np -t 0 -c -d --max-threads=1 -o log.log http://a.com:80/b/c/

"I tell "wget2" that this is a "URL" by adding two spaces before the "URL". Now the download directory has the correct file ("partially", because "wget2" is still downloading the rest of the files). The log file "log.log" doesn't seem to have any logged "wget" behavior (probably due to my use of "-d") I'll keep an eye on the log and let you know what happens.

Thank you for your professionalism and patience. :)

HtnrxUNc commented 3 years ago

I added the parameter "--max-threads=1" to "wget2" and it slowed down, but the logs are very recognizable. I now have the log of "wget2's" behavior and the window no longer outputs a lot of behavior. I'm not sure if the window outputs a red error message, because the logs are large and I'm not sure I can get a complete checkout without any errors. Unless one uses the machine, or looks it up. Afterwards, when "wget2" completes its task, I will try to see if there is a red error in the window, and if not, I will use the same parameters again to download a few more directories that humans can better confirm with their eyes if the download is complete. Multiple attempts may or may not cause problems again. If the problem does not reoccur, we can be sure that my local storage system is causing the error for "wget2". If the problem occurs again, I may have to check the behavior log of "wget2" with my eyes.

HtnrxUNc commented 3 years ago

wget2 -m -np -t 0 -c -d --num-threads=1 -o log.log http://a.com:80/b/c/

Oh sorry. It must be --max-threads=1 instead of --num-threads=1.

Sorry, I'll re-fix the log format.

Hi, this is part of the log, when I try to search for the filename in the recorded log, he shows 24 results, this may be only part of it. I will continue to send out logs later. (The file names and paths in the logs have been replaced, so let me know if you need the original version.)

02.113132.589 span/@class=file-modified col-md-3 col-sm-4 hidden-xs text-right 02.113132.589 ='2021-09-11T15:29:42.236Z' 02.113132.589 a/@class=clearfix 02.113132.589 a/@href=/b/c.mp4 02.113132.589 a/@target=_blank 02.113132.589 a/@title=c.mp4 02.113132.589 div/@class=row 02.113132.589 span/@class=file-name col-md-7 col-sm-6 col-xs-8 02.113132.589 span/@data-type=video 02.113132.589 span/@data-thumb=/b/c.mp4?thumb 02.113132.589 span/@class=file-thumb 02.113132.589 =' ' 02.113132.589 div/@class=file-thumb-img 02.113132.589 div/@data-src=/b/c.mp4?thumb 02.113132.589 i/@class=ic ic-video 02.113132.589 span/@class=file-name-title 02.113132.589 ='c.mp4' 02.113132.589 span/@class=file-size col-md-2 col-sm-2 col-xs-4 text-right 02.113132.589 ='1.83 GB' 02.113132.589 span/@class=file-modified col-md-3 col-sm-4 hidden-xs text-right 02.113132.589 ='2021-09-11T15:29:42.181Z' 02.113132.589 a/@class=clearfix 02.113132.589

two

02.113132.591 host_add_job: qsize 13 host-qsize=13 02.113132.591 url = /b/c.mp4 02.113132.591 path /b/c.mp4 -> 02.113132.591 b/c.mp4 02.113132.591 2 http://a.com:80/b/c.mp4 Adding URL: http://a.com:80/b/c.mp4 02.113132.591 local filename = '/root/GoogleDrive/alyun//a.com/b/c.mp4' 02.113132.591 host_add_job: job fname /root/GoogleDrive/alyun//a.com/b/c.mp4 02.113132.591 host_add_job: 0x7f66c8022c50 http://a.com:80/b/c.mp4 02.113132.591 host_add_job: qsize 14 host-qsize=14

three

02.120830.347 keep_alive=1 02.120830.348 _host_remove_job: 0x7f66c0035630 02.120830.349 host_remove_job: qsize=8 host->qsize=8 02.120830.349 [0] action=1 pending=0 host=0x55e637dd7580 02.120830.349 dequeue job http://a.com:80/b/c.mp4 02.120830.349 reuse connection a.com [0] Downloading 'http://a.com:80/b/c.mp4' ... 02.120830.349 main: wake up 02.120831.536 cookie_create_request_header for host=a.com path=b/c.mp4 02.120831.536 path_match(/,/b/c.mp4) 02.120831.536 found USER_SID=6210jAuEKnKRFFTASKy6djnOhIUqLY1X 02.120831.537 path_match(/,/b/c.mp4) 02.120831.537 found locale=en-us 02.120831.537 # sent 629 bytes: GET /b/c.mp4 HTTP/1.1 Host: a.com Accept-Encoding: gzip, deflate, bzip2, xz, lzma Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 User-Agent: wget2/2.0.0 Connection: keep-alive Referer: http://a.com:80/d Cookie: USER_SID=6210jAuEKnKRFFTASKy6djnOhIUqLY1X;

IV

02.120831.537 [0] action=2 pending=1 host=0x55e637dd7580 02.120831.537 ### req 0x7f66c001c760 pending requests = 1 02.120831.537 host_increase_failure: a.com failures=1 02.120831.537 [0] action=3 pending=1 host=0x55e637dd7580 02.120831.537 closing connection 02.120831.537 released job http://a.com:80/b/c.mp4 02.120831.537 [0] action=1 pending=0 host=0x0 02.120831.537 host a.com is paused 1000ms 02.120831.537 main: wake up 02.120832.537 [0] action=1 pending=0 host=0x0 02.120832.537 dequeue job http://a.com:80/b/c.mp4 02.120832.537 Found dns cache entry a.com:80 02.120832.537 trying 127.0.0.1:80... 02.120832.538 established connection a.com [0] Downloading 'http://a.com:80/b/c.mp4' ... 02.120832.539 cookie_create_request_header for host=a.com path=b/c.mp4 02.120832.539 path_match(/,/b/c.mp4) 02.120832.539 found USER_SID=6210jAuEKnKRFFTASKy6djnOhIUqLY1X 02.120832.540 path_match(/,/b/c.mp4) 02.120832.540 found locale=en-us 02.120832.540 # sent 629 bytes: GET /b/c.mp4 HTTP/1.1 Host: a.com A

Please note that although this domain may be resolved to "127.0.0.1" by me, it is only a reverse proxy like "nginx"

V

02.120832.541 [0] action=2 pending=1 host=0x55e637dd7580 02.120832.541 ### req 0x7f66c001ca70 pending requests = 1 02.120835.034 # got header 714 bytes: HTTP/1.1 200 OK Vary: Origin Content-Range: bytes 0-1968375379/1968375380 content-length: 1968375380 content-type: video/mp4 server: AliyunOSS date: Tue, 02 Nov 2021 16:08:37 GMT connection: keep-alive x-oss-request-id: 61816285548A293434DE7258 accept-ranges: bytes etag: "047C682D50CD29EA5235A7E65E2FD041-376" last-modified: Tue, 29 Dec 2020 15:43:50 GMT x-oss-object-type: Multipart x-oss-hash-func: SHA-1 x-oss-hash-value: 4898886F2631481ABDD489B382F3666BA8381042 x-oss-hash-crc64ecma: 5103362350473163937 x-oss-storage-class: Standard content-disposition: attachment; filename*=UTF-8''e.mp4 content-md5: ZqU4X/Bo2uXhEqmrzRIFnA== x-oss-server-time: 7

02.120835.035 mkdir(/root)=-1 errno=17 02.120835.035 mkdir(/root/GoogleDrive)=-1 errno=17 02.120835.035 mkdir(/root/GoogleDrive/alyun)=-1 errno=17 02.120835.035 mkdir(/root/GoogleDrive/alyun/)=-1 errno=17 02.120835.035 mkdir(/root/GoogleDrive/alyun//a.com)=-1 errno=17 02.120835.035 mkdir(/root/GoogleDrive/alyun//a.com/g)=-1 errno=17 02.120835.035 mkdir(/root/GoogleDrive/alyun//a.com/d)=-1 errno=17 02.120835.036 mkdir(/root/GoogleDrive/alyun//a.com/d/f)=-1 errno=17 Saving '/root/GoogleDrive/alyun//a.com/d/f/c.mp4' 02.120835.045 method 2 Just got 1656034149 of 1969327715 bytes HTTP response 200 OK

HtnrxUNc commented 3 years ago

There shouldn't be any other logs about this file (I'm not sure).