mattly / bork

the Bash-Operated Reconciling Kludge
Other
218 stars 29 forks source link

download fails when Content-Length is missing from headers #73

Open bcomnes opened 8 years ago

bcomnes commented 8 years ago

The following assertion fails:

ok download ~/.ssh/authorized_keys "https://github.com/bcomnes.keys" --size

because the http headers from this github request fails to return a Content-length

bret-mbr:.ssh bret$ bork status ~/.dotfiles/install/ssh.sh 
conflict (upgradable): download authorized_keys https://github.com/bcomnes.keys
expected size:  bytes
received size:  bytes
bret-mbr:.ssh bret$ curl -sI https://github.com/bcomnes.keys
HTTP/1.1 200 OK
Server: GitHub.com
Date: Mon, 09 May 2016 00:13:04 GMT
Content-Type: text/plain; charset=utf-8
Status: 200 OK
Cache-Control: no-cache
Vary: X-PJAX
X-UA-Compatible: IE=Edge,chrome=1
Set-Cookie: logged_in=no; domain=.github.com; path=/; expires=Fri, 09 May 2036 00:13:04 -0000; secure; HttpOnly
X-Request-Id: d529280b38e3de27e17235e81f5cf091
X-Runtime: 0.008604
Content-Security-Policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; child-src render.githubusercontent.com; connect-src 'self' uploads.github.com status.github.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com api.braintreegateway.com client-analytics.braintreegateway.com wss://live.github.com; font-src assets-cdn.github.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src 'self' data: assets-cdn.github.com identicons.github.com www.google-analytics.com collector.githubapp.com *.gravatar.com *.wp.com checkout.paypal.com *.githubusercontent.com; media-src 'none'; object-src assets-cdn.github.com; plugin-types application/x-shockwave-flash; script-src assets-cdn.github.com; style-src 'unsafe-inline' assets-cdn.github.com
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
Public-Key-Pins: max-age=5184000; pin-sha256="WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18="; pin-sha256="RRM1dGqnDFsCJXBTHky16vi1obOlCgFFn/yOhI/y+ho="; pin-sha256="k2v657xBsOVe1PQRwOsHsw3bsGT2VzIqz5K+59sNQws="; pin-sha256="K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q="; pin-sha256="IQBnNBEiFuhj+8x6X8XLgh01V9Ic5/V3IRQLNFFc7v4="; pin-sha256="iie1VXtL7HzAMF+/PVPR9xzT80kQxdZeJ+zduCB3uj0="; pin-sha256="LvRiGEjRqfzurezaWuj8Wie2gyHMrW5Q06LspMnox7A="; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
Vary: Accept-Encoding
X-Served-By: 1c0ce1a213af16e49d5419559ef44f50
X-GitHub-Request-Id: 499DD518:7DA5:9D56CB7:572FD60F

Not really sure what the right way to handle this. Obviously, this strategy shouldn't be used in this kind of scenario. Maybe some kind of alternative hashing comparison could be used instead? I found this when dealing with the https://github.com/mattly/bork/issues/72 bug.

frdmn commented 8 years ago

Anther great catch. Thanks for reporting the issues, @bcomnes 👍

I feel like hashing won't really work unless the file is actually downloaded completly (which we don't really want in this case). Hashing just the filename on the other hand is not enough for a in-depth comparision.

I suggest we skip the "comparision" feature in this case and just re-download the file when there's no Content-Length set. What do you think?

/cc @mattly

bcomnes commented 8 years ago

The usefulness of the hashing comes from knowing if the remote file is different than what is present on the system... when that kind of information is important. For example, I really want to see if ~/.ssh/authorized_keys gets updated or not for various reasons.

Yes there is a cost of downloading and hashing, comparing, downloading again to replace, then downloading and hashing again to double check, but for the case of small files, this cost is an ignorable cost to get at the 'did the file change or not' data point.

It also shares a similar timing attack/bug vulnerability as the size check if the file happens to change on the server at that time, or incorrect headers are being sent across all 3 requests. One improvement would be to cache the request results once, perform the desired checks, attempt and recheck with a single request. For small files, you can assign the contents to a variable and this is pretty achievable.

Here is an almost working prototype:

if [ -n "$hash" ]; then
  sourcesha=$(bake shasum -a 256 "\"$targetfile\"")
  remotesha=$(curl -sL $sourceurl | shasum -a 256 )
  if [ "$sourcesha" != "$remotesha" ]; then
    echo "expected sha256: $remotesha"
    echo "received sha256: $sourcesha"
    return $STATUS_CONFLICT_UPGRADE
  fi
fi

Third, its not clear to me if the server 404, or 500's, it looks like download upgrade will still offer to upgrade vs outright fail since there are no status code checks? This is a bigger issue for compiled scripts that perform the upgrade automatically.

mattly commented 6 years ago

I don't remember what use case my initial stab at download was intending to solve, but I can 100% guarantee you it had a Content-Length header :p.

I think you both have good thoughts on the correct behaviors, and hash is definitely a better thing to check against than size. I'm going to deprecate the size option and remove it for 1.0, and add in a hash option. Thoughts on letting people specify the algorithm?

bcomnes commented 6 years ago

Unless trivial to add, I would leave it to those who want to customize the algorithm to explore/implement that. I'm totally fine with a sane default.