steiza / docstore

For any civics-minded organization that needs a simple place to host documents publicly
http://a2docs.org/
7 stars 4 forks source link

consider setting content-type on attached files #29

Closed crewjam closed 3 years ago

crewjam commented 3 years ago

It would be nice to have the Content-type response header set for attached files, which might make reading on e.g. Chrome, iOS webview, etc. more convenient. I'm not sure if setting content-disposition: attachment prevents the webview from displaying the document in the native PDF viewer, but I can experiment with that if needed.

$ curl -v https://a2docs.org/file/570/2760+Stanton+-+FOIA+Final.pdf
> GET /file/570/2760+Stanton+-+FOIA+Final.pdf HTTP/2
> Host: a2docs.org
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/2 200
< date: Fri, 11 Dec 2020 16:44:08 GMT
< content-type: application/octet-stream
< content-length: 167169
< server: TornadoServer/6.0.3
< content-disposition: attachment; filename="2760 Stanton - FOIA Final.pdf"
< etag: "770df252e24b5b9c39539ec2a8a459da19a45e1e"
< strict-transport-security: max-age=15768000

A link that does display inline correctly:

$ curl -v https://cdn.ballotpedia.org/images/c/cf/2020_Hawaii_sample_ballot_%28Hawaii_County%29.pdf
> Host: cdn.ballotpedia.org
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/2 200
< content-type: application/pdf
< content-length: 648057
< date: Fri, 11 Dec 2020 16:45:16 GMT
< last-modified: Tue, 20 Oct 2020 16:35:33 GMT
< etag: "bd9648313b96686eb357f26a728f7914"
< accept-ranges: bytes
< server: AmazonS3
< x-cache: Miss from cloudfront
< via: 1.1 63b9a4cda82206b6b34aab8f3e958cbe.cloudfront.net (CloudFront)
< x-amz-cf-pop: ORD52-C1
< x-amz-cf-id: l2t0ZfreqWrmhldPoyPu70kdH7JORaGyjK_ZIRpP_U6cOaLV2gyJTQ==
vielmetti commented 3 years ago

Looks like the code that does the work would go here

https://github.com/steiza/docstore/blame/02f2a881ab103697b02ffe3e6ffa8d545caf0d8f/docstore#L280

steiza commented 3 years ago

Oh hey @crewjam, that's a great suggestion!

From poking around, it looks like most of the content is .pdf, but then I also saw a .xls. I started researching MIME types and it turns out there's a lot. But it turns out there's a MIME type map in the Python 3 standard library (!!!) My mind was blown.

I think https://github.com/steiza/docstore/commit/b904d1f8519a27e5d9fa256de0f1b71718cfd42d should resolve this issue.

I have no idea how busy @eby is, but he'd be the one to roll out this change.

eby commented 3 years ago

This is live in production. Seems to work on the PDFs I tried.

crewjam commented 3 years ago

Thanks for the effort, folks! Works for me. 🙇