Download fails to complete sometimes, in conjunction with Amazon S3 sync

mike-whittaker-work commented 7 years ago

Many apologies for lack of detail here, but thought it better to note this even though I cannot supply the data to analyse the problem - in case someone with a more detailed knowledge can spot the design "hole" !

Have noted that sometimes, zsync-curl file downloads fail to complete, reaching eg 96.3% then repeatedly stalling and restarting. This is when the hosted files are on an Amazon S3 server - the hypothesis is that the .zsync control file and the actual target files are not yet in step when being synced onto the server - so the download starts and then the target file changes "under our feet" - meaning that the control file information (size etc) is no longer valid.

More data might follow:

probonopd commented 7 years ago

Thanks @mike-whittaker-work - do you have a sample URL?

mike-whittaker-work commented 7 years ago

Sorry - my apologies are a result of this ! - but this is on trial development systems which are not available to the public, and which require certain certificates which are not public. I understand that this probably makes my report "not very helpful" but thought it worth making the comment in case, as stated above, it does help indicate test use-cases.

probonopd commented 7 years ago

Can you check if the URL in https://github.com/probonopd/zsync-curl/issues/15 gives the same behavior, or whether these are different issues?

mike-whittaker-work commented 7 years ago

If my hypothesis is correct, having zsync_curl re-acquire the .zsync control file, if the Last-Modified header field of the target file, was later than the Last-Modified header field of the .zsync control file, would fix the problem. But I do not yet have the data to support that. In principle, it makes sense anyway to do that, however !

mike-whittaker-work commented 7 years ago

It looks as if it gets stuck in client.c: fetch_remaining_blocks(), and keeps retrying. Cannot yet determine why it does not leave the loop. But I notice that when things go wrong, the download file has just changed in size, so maybe based on the info it has, the loop can never terminate.

In which case I would consider instituting some kind of retry counter, which if exceeded, would restart the entire download including the initial fetch of the .zsync control file.

mike-whittaker-work commented 7 years ago

A colleague has suggested that it's the S3 syncs that cause glitches in the reported content-size. zsync-curl appears to be vulnerable to this kind of glitch, in a non-benign way, and would require some kind of reactive workaround fix to correct this, as I described above.

probonopd commented 7 years ago

Do I understand you right that you suspect the error only exists if a previous payload file existed in the same location that had a different file size? If this is the hypothesis, then

A payload file + zsync file uploaded for the first time should work
A payload file + zsync file uploaded as a replacement for an older version (using the same name) should NOT work

Correct?

mike-whittaker-work commented 7 years ago

Not quite - I do not know the exact mechanism, I just see the resulting change in Content-Length.

I just left a script in a loop, with CURLOPT_VERBOSE=1 output enabled, downloading the same file, until I saw it "stick", then looked at the log output.

I do not have the remit to take this much further now since the project is dropping the use of zsync-curl in light of its misbehaviour against the Amazon S3 servers.

Hence I am just updating this in the hope it will help someone else !

Ideally I would also instrument fetch_remaining_blocks() so I could find out more about why it did not either complete, or error-out.

probonopd commented 7 years ago

OK, thanks for the pointer, anyway! Hopefully we can figure it out from here soon.

mike-whittaker-work commented 7 years ago

Excerpt from log when transfer gets stuck (some text / IP addresses have been changed): [my custom version that supplies certs to TLS]


Setting CURLOPT_CAINFO to /etc/ssl/certs/ca-roots.crt
* Hostname was NOT found in DNS cache
*   Trying 52.222.111.111...
* Connected to blah.fw.blah.info (52.222.111.111) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-roots.crt
CApath: none
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
*    subject: CN=blah.fw.blah.info
*    start date: 2016-12-22 00:00:00 GMT
*    expire date: 2018-01-22 12:00:00 GMT
*    subjectAltName: blah.fw.blah.info matched
*    issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
*    SSL certificate verify ok.
GET /stg/box/Packages.gz.zsync HTTP/1.1

Host: blah.fw.blah.info Accept: /

< HTTP/1.1 200 OK < Content-Type: application/octet-stream < Content-Length: 482 < Connection: keep-alive < Date: Thu, 19 Jan 2017 15:04:37 GMT < Last-Modified: Thu, 19 Jan 2017 07:20:07 GMT < ETag: "e0422e93e01b0fa3e033ffabdf4fccb124629"8 < x-amz-meta-s3cmd-attrs: uid:1002/gname:zoob/uname:zoob/gid:1002/mode:33204/mtime:1484653263/atime:1484803425/md5:e04e936e01b0fae07733ffab8dffccb12429/ctime:1484803425 < Accept-Ranges: bytes

Server AmazonS3 is not blacklisted < Server: AmazonS3 < Age: 70 < X-Cache: Hit from cloudfront < Via: 1.1 c3f05865f6282a5aa7a0ea4c56a36ec92d60.cloudfront.net (CloudFront) < X-Amz-Cf-Id: emm_HWUZ3YojJy5BjsI2WE23Lwh_leoemzBLFiazSYM77PKAQFrOksHiPg==

<

Connection #0 to host blah.fw.blah.info left intact

Leaving referer as https://blah.fw.blah.info/stg/box/Packages.gz.zsync

Setting redirected to https://blah.fw.blah.info/stg/box/Packages.gz.zsync

Target Deps No relevent local data found - I will be downloading the whole file. If that's not what you want, CTRL-C out. You should specify the local file is the old version of the file to download with -i (you might have to decompress it with gzip -d first). Or perhaps you just have no data that helps download the file

fetch from Packages.gz ### USE THE REDIRECTED URL FROM NOW ON

)

make_url_absolute(https://blah.fw.blah.info/stg/box/Packages.gz.zsync, Packages.gz)

Setting CURLOPT_CAINFO to /etc/ssl/certs/ca-roots.crt

Redirected payload URL: https://blah.fw.blah.info/stg/box/Packages.gz

make_url_absolute(https://blah.fw.blah.info/stg/box/Packages.gz.zsync, https://blah.fw.blah.info/stg/box/Packages.gz)

downloading from https://blah.fw.blah.info/stg/box/Packages.gz: -------------------- 0.0% Setting CURLOPT_CAINFO to /etc/ssl/certs/ca-roots.crt

Hostname was NOT found in DNS cache
Trying 52.222.111.111...
Connected to blah.fw.blah.info (52.222.111.111) port 443 (#0)
successfully set certificate verify locations:
CAfile: /etc/ssl/certs/ca-roots.crt CApath: none
SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
Server certificate:
subject: CN=blah.fw.blah.info
start date: 2016-12-22 00:00:00 GMT
expire date: 2018-01-22 12:00:00 GMT
subjectAltName: blah.fw.blah.info matched
issuer: C=US; O=Amazon; OU=Server CA 1B; CN=Amazon
SSL certificate verify ok.

GET /stg/box/Packages.gz HTTP/1.1 Range: bytes=15-4427 Host: blah.fw.blah.info Accept: /

< HTTP/1.1 206 Partial Content < Content-Type: application/gzip < Content-Length: 4261 < Connection: keep-alive < Date: Thu, 19 Jan 2017 15:04:38 GMT < Last-Modified: Tue, 17 Jan 2017 11:56:43 GMT < ETag: "690a02701c371253372a32620746175364401e50" < x-amz-meta-s3cmd-attrs: uid:1002/gname:zoob/uname:zoob/gid:1002/mode:33277/mtime:1484653263/atime:1484653793/md5:690aa0270c37125972a326206175301e50/ctime:1484653632 < Accept-Ranges: bytes

Server AmazonS3 is not blacklisted < Server: AmazonS3 < Age: 71 < Content-Range: bytes 15-4`275/4435 < X-Cache: Hit from cloudfront < Via: 1.1 b4931726684b8709924723090d6ce0d62daa5.cloudfront.net (CloudFront) < X-Amz-Cf-Id: SS6_oIl66b6B78dT49oImd3wTrOHmElrei6eQyxnXLx6X9Rf1gHkebeaNsw==

<

Connection #0 to host blah.fw.blah.info left intact

-------------------- 0.0% 0.0 kBps

fetch from Packages.gz ### USE THE REDIRECTED URL FROM NOW ON

)

make_url_absolute(https://blah.fw.blah.info/stg/box/Packages.gz.zsync, Packages.gz)

Setting CURLOPT_CAINFO to /etc/ssl/certs/ca-roots.crt

Redirected payload URL: https://blah.fw.blah.info/stg/box/Packages.gz

make_url_absolute(https://blah.fw.blah.info/stg/box/Packages.gz.zsync, https://blah.fw.blah.info/stg/box/Packages.gz)

downloading from https://blah.fw.blah.info/stg/box/Packages.gz: ##################-- 92.9% Setting CURLOPT_CAINFO to /etc/ssl/certs/ca-roots.crt

Found bundle for host blah.fw.blah.info: 0x3cbe8
Re-using existing connection! (#0) with host blah.fw.blah.info
Connected to blah.fw.blah.info (52.222.111.111) port 443 (#0)
GET /stg/box/Packages.gz HTTP/1.1 Range: bytes=15-215,4010-4427 Host: blah.fw.blah.info

TheAssassin commented 6 years ago

Please switch to zsync2, and report bugs there if you find some. Closing as this repository is no longer developed.

probonopd / zsync-curl