inveniosoftware / invenio-files-rest

REST API for uploading/downloading files for Invenio.
https://invenio-files-rest.readthedocs.io
MIT License
9 stars 68 forks source link

models: mimetype detection issue for gzip and other compressed types #199

Closed lnielsen closed 5 years ago

lnielsen commented 5 years ago

The mimetype is guessed like this in models.py

>>> import mimetypes
>>> mimetypes.guess_type('test.csv')
('text/csv', None)
>>> mimetypes.guess_type('test.zip')
('application/zip', None)

Essentially only the first part is used. This however causes troubles for e.g. gzip and bzip2 compressed files:

>>> mimetypes.guess_type('test.csv.gz')
('text/csv', 'gzip')
>>> mimetypes.guess_type('test.csv.bz2')
('text/csv', 'bzip2')

This causes text/csv to be guess as a mimetype for test.csv.gz. Combined with the sanitization of mimetypes, the client will be sent a text/plain content type. This causes upstream servers (like nginx) configured with to gzip plain text to compress the already compressed file again.

lnielsen commented 5 years ago

https://github.com/inveniosoftware/invenio-files-rest/blob/master/invenio_files_rest/models.py#L956-L963