rgrove / rawgit

Served files from raw.githubusercontent.com, but with the correct content types. No longer actively developed.
https://rawgit.com
MIT License
2.39k stars 488 forks source link

Cdn subdomain does not include `content-length` header #81

Closed captbaritone closed 9 years ago

captbaritone commented 9 years ago

When I request https://rawgit.com/captbaritone/llama/master/llama-2.91.mp3 I get redirected to githubusercontent.com which includes the content-length header.

curl -I https://raw.githubusercontent.com/captbaritone/llama/master/llama-2.91.mp3
HTTP/1.1 200 OK
Content-Security-Policy: default-src 'none'
X-XSS-Protection: 1; mode=block
X-Frame-Options: deny
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000
ETag: "b4158fd949962291c97b36a48c02992c28bfa147"
Content-Type: audio/mpeg
Cache-Control: max-age=300
Content-Length: 38912
Accept-Ranges: bytes
Date: Thu, 01 Oct 2015 03:39:32 GMT
Via: 1.1 varnish
Connection: keep-alive
X-Served-By: cache-sjc3125-SJC
X-Cache: MISS
X-Cache-Hits: 0
Vary: Authorization,Accept-Encoding
Access-Control-Allow-Origin: *
Expires: Thu, 01 Oct 2015 03:44:32 GMT
Source-Age: 0

When I request https://cdn.rawgit.com/captbaritone/llama/master/llama-2.91.mp3 I do not get the content length header.

curl -I https://cdn.rawgit.com/captbaritone/llama/master/llama-2.91.mp3
HTTP/1.1 200 OK
Date: Thu, 01 Oct 2015 03:39:56 GMT
Content-Type: audio/mpeg
Connection: keep-alive
X-Content-Type-Options: nosniff
X-Robots-Tag: none
Access-Control-Allow-Origin: *
Cache-Control: max-age=315569000
ETag: "b4158fd949962291c97b36a48c02992c28bfa147"
Vary: Accept-Encoding
RawGit-Cache-Status: HIT
Server: NetDNA-cache/2.2
X-Cache: HIT

The content-length would be useful for me, as it's needed by Web Audio API to determine the length of the mp3.

Thanks a bunch for this great service.

rgrove commented 9 years ago

At first this looked like a bug, since RawGit should at least be serving a Transfer-Encoding header, but upon investigation I realized that curl -I performs a HEAD request. Transfer-Encoding isn't necessary for HEAD responses, so the server doesn't send it.

If you do a GET request, you can see that the response does include a Transfer-Encoding header:

$ curl -s -D - https://cdn.rawgit.com/captbaritone/llama/master/llama-2.91.mp3 -o /dev/null
HTTP/1.1 200 OK
Date: Thu, 01 Oct 2015 17:01:07 GMT
Content-Type: audio/mpeg
Transfer-Encoding: chunked
Connection: keep-alive
X-Content-Type-Options: nosniff
X-Robots-Tag: none
Access-Control-Allow-Origin: *
Cache-Control: max-age=315569000
ETag: "b4158fd949962291c97b36a48c02992c28bfa147"
Vary: Accept-Encoding
RawGit-Cache-Status: HIT
Server: NetDNA-cache/2.2
X-Cache: HIT

The good news (for me at least) is that there's not actually a bug here. RawGit uses chunked responses by design, since it allows the response to be streamed with no or minimal buffering. But this doesn't help you, since you apparently need to know the length of the complete content ahead of time.

Unfortunately, HTTP/1.1 prohibits including both Transfer-Encoding: chunked and a Content-Length header in a response. I could have RawGit avoid chunked encoding for audio/mpeg files, but that opens a Pandora's box. Should RawGit also support range requests for audio files to allow seeking? What about video files? Should RawGit even be in the media hosting business?

I think the answer to that -- at least right now -- is no. While I wish I could make everyone happy, there are a lot of complexities involved in serving audio and video properly, some of which would require compromises that could affect how efficiently other file types are served. There's also the cost to consider: while RawGit's CDN bandwidth is generously donated by MaxCDN, I still pay the costs of serving files from the origin server to the CDN, and those costs are slowly but steadily rising.

There's also the fact that RawGit's core value proposition -- serving files from GitHub with the correct Content-Type headers -- isn't even necessary for most binary file types, since GitHub already serves those files with the correct Content-Type headers. GitHub Pages might be an even better solution.

So, after careful consideration, I'm afraid I have to decline to change this behavior for now. I'm sorry to disappoint.

captbaritone commented 9 years ago

Thanks for the detailed response. Seems like a very reasonable approach. For now I'll just use githubusercontent.com and see if anybody complains.