munki / macadmin-scripts

Scripts of possible interest to macOS admins
Other
2.32k stars 524 forks source link

Fix HTTP resume #84

Closed craig65535 closed 3 years ago

craig65535 commented 3 years ago

If you start the script, select a release, and interrupt the download, the transfer of the partially-downloaded file is not resumed with the script is restarted.

Here, I'm stopping a download with ^C:

Choose a product to download (1-14): 2
Downloading http://swcdn.apple.com/content/downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 25  475M   25  119M    0     0  9856k      0  0:00:49  0:00:12  0:00:37 9839k^CTraceback (most recent call last):
  File "installinstallmacos.py", line 633, in <module>
    main()
  File "installinstallmacos.py", line 582, in main
    catalog, product_id, args.workdir, ignore_cache=args.ignore_cache)
  File "installinstallmacos.py", line 463, in replicate_product
    attempt_resume=(not ignore_cache))
  File "installinstallmacos.py", line 282, in replicate_url
    subprocess.check_call(curl_cmd)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 185, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 172, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1099, in wait
    pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 125, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt
Craig-Davisons-Mac-mini-2:macOS crg$ find content/ -name BaseSystem.dmg -ls
22409155   263816 -rw-r--r--    1 root             staff            131301376 24 Jan 23:02 content//downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg

And, on restart:

Choose a product to download (1-14): 2
Downloading http://swcdn.apple.com/content/downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg...
** Resuming transfer from byte position 131301376
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
$ find content/ -name BaseSystem.dmg -ls
22409155   263816 -rw-r--r--    1 root             staff            131301376 24 Jan 23:02 content//downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg

We can see that curl didn't actually complete the download.

It looks like the script uses a command like /usr/bin/curl -fL --create-dirs -o ./content/downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg --compressed -z ./content/downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg -C - http://swcdn.apple.com/content/downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg to download the file. Because -z is specified, the If-Modified-Since header is added. Because the file on the server side is older than the date in that header, the web server returns 304 and no file contents.

I tried running the curl command above with -v to illustrate this:

$ sudo /usr/bin/curl -v -fL --create-dirs -o ./content/downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg --compressed -z ./content/downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg -C - http://swcdn.apple.com/content/downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg
** Resuming transfer from byte position 131301376
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 8.253.151.116...
* TCP_NODELAY set
* Connected to swcdn.apple.com (8.253.151.116) port 80 (#0)
> GET /content/downloads/26/37/001-68446/r1dbqtmf3mtpikjnd04cq31p4jk91dceh8/BaseSystem.dmg HTTP/1.1
> Host: swcdn.apple.com
> Range: bytes=131301376-
> User-Agent: curl/7.64.1
> Accept: */*
> Accept-Encoding: deflate, gzip
> If-Modified-Since: Mon, 25 Jan 2021 06:02:19 GMT
> 
< HTTP/1.1 304 Not Modified
< Date: Sat, 16 Jan 2021 16:03:10 GMT
< Connection: keep-alive
< Cache-Control: public, max-age=2592000
< ETag: "76A63F2ADA06FD857D13821E2F958F76-8"
< Expires: Mon, 15 Feb 2021 16:03:10 GMT
< Last-Modified: Tue, 10 Nov 2020 23:21:44 GMT
< Server: ATS/8.1.1
< CDNUUID: 8cbd6117-5047-4bae-b7ef-e8ba0e7a8238-637869255
< X-Apple-MS-Content-Length: 498625205
< x-apple-request-uuid: 2fc47886-62e8-4d41-83f6-f1d126bf464d,2fc47886-62e8-4d41-83f6-f1d126bf464d
< X-iCloud-Content-Length: 498625205
< x-icloud-versionid: 7f544120-23ab-11eb-b1a9-248a078d3ec5
< X-Responding-Server: massilia_protocol_028:328000704:mr33p01if-zteh08013901.mr.if.apple.com:8083:20P53:f929716938ef
< X-Cache: HIT
< Age: 741773
< 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Connection #0 to host swcdn.apple.com left intact
* Closing connection 0

The fix is to use If-Unmodified-Since instead of If-modified-since. This allows the resume to complete while ensuring the same copy of the file is being downloaded. If the server side file is newer, the download will fail with HTTP 412 (Precondition Failed), which is good because we don't want to resume an out-of-date download with new content at the end of the file.

All that needed to change was prefixing the filename specified in the -z option with a -. The curl man page says:

Start the date expression with a dash (-) to make it request for a document that is older than the given date/time, default is a document that is newer than the specified date/time.

gregneagle commented 3 years ago

This is really excellent effort. I have a question, though: if Apple updates a file, how will we ever download it? If when a (possibly partial) file exists, we download using If-Unmodified-Since, it seems to me that once we have a complete download, if Apple changes/updates the file server-side, we'll never retrieve it until we remove the downloaded file. Am I missing something?

craig65535 commented 3 years ago

Hello @gregneagle,

If when a (possibly partial) file exists, we download using If-Unmodified-Since, it seems to me that once we have a complete download, if Apple changes/updates the file server-side, we'll never retrieve it until we remove the downloaded file

Once Apple changes the file server-side, subsequent requests for that file will fail with HTTP error 412 (precondition failed). The user would then have to re-run the script with --ignore-cache to refresh the file.

We could handle that programmatically, and remove the file/retry on error 412, but felt that was too big of a change for this PR. I thought fixing resume might be enough of an improvement on its own.

gregneagle commented 3 years ago

Ugh. Feels like trading one undesired result for a different undesired result.

gregneagle commented 3 years ago

(IOW, right now resume is broken, but you'll get the current version of the file if/when Apple updates it. With your proposed change, resume works, but now you might have out-of-date files and not know it)

craig65535 commented 3 years ago

I think this is better as it never results in a silently partially-truncated download.

now you might have out-of-date files and not know it

No, the script will fail and the user will see error 412 if any files are out-of-date.

gregneagle commented 3 years ago

"No, the script will fail and the user will see error 412 if any files are out-of-date." which will result in an avalanche of support questions for me. :-(

gregneagle commented 3 years ago

I think if you get error 412, you should remove the existing file (partial or complete), and restart the download.

craig65535 commented 3 years ago

@gregneagle PTAL

gregneagle commented 3 years ago

Apologies for taking so long; was caught up in important projects at work and forgot this was waiting for me to look at. It looks reasonable to me, and I'm hoping my delay with no follow up changes from you means you haven't discovered any other issues.