whatwg / mimesniff

MIME Sniffing Standard
https://mimesniff.spec.whatwg.org/
Other
109 stars 44 forks source link

Support for sniffing text/csv and text/tab-separated-values #9

Closed bramp closed 8 years ago

bramp commented 8 years ago

I would like to request support for sniffing "test/csv" and "text/tab-separated-values" types.

rfc4180 lays out the "Common Format and MIME Type for Comma-Separated Values (CSV) Files", and rfc7111 updates it.

A simple heuristic to check if the file is a csv or tsv is if it is already identified as text/plain, then count if the first few lines have the same number of commas (or tabs) on each line.

annevk commented 8 years ago

Why? It would be much better to label the resource correctly. Sniffing is bad.

bramp commented 8 years ago

I agree setting the mime type would be best, but this would be for the case where the mime-type isn't set, or being set incorrectly. Isn't that the point of this project?

marcoscaceres commented 8 years ago

I agree setting the mime type would be best, but this would be for the case where the mime-type isn't set, or being set incorrectly. Isn't that the point of this project?

Kinda. The point of the project is to capture what browsers, unfortunately, already do wrt sniffing (for legacy reasons and compat with old web and some legacy HTTP server software serving the wrong types) - while also discouraging adding new types of sniffing for rare types.

marcoscaceres commented 8 years ago

In particular, SNIFF it explains how HTMLImageElement is able to recover when served given text/plain as an type. However, for text/csv, a developer would generally be using XHR or fetch(), so error recovery can be handled by the developer (instead of the browser).

bramp commented 8 years ago

Thank you all for your replies.

I came across this project, because a command line tool I use is using an implementation of your standard to sniff the type of a local file. However, since you are trying to formalise what behaviour browsers already follow, then my request doesn't make sense. Instead I will work with the command line to use a more complete sniffing library, like libmagic.