hyperium / mime

MIMEs in Rust
https://docs.rs/mime
MIT License
177 stars 78 forks source link

Should mime just use the MIME sniffing algorithm? #106

Open seanmonstar opened 5 years ago

seanmonstar commented 5 years ago

The target domain of the mime crate is webdev. Instead of following the original RFCs (as is done now), perhaps it's best to just use the sniffing algorithm that is now used by web browsers.

seanmonstar commented 5 years ago

cc @nox @SimonSapin @rustonaut

SimonSapin commented 5 years ago

https://mimesniff.spec.whatwg.org/ is called "MIME Sniffing" and contains a parse a MIME type algorithm that is relevant.

But "sniffing" refers to looking at the contents of a file or the body of an HTTP response (in addition to other signals) to make a guess at the actual file format, in case the Content-Type header is missing or unspecific or inaccurate. For example, if the first 6 bytes of a file are GIF89a in ASCII it’s very probably a GIF, especially if it’s used in <img>. That spec also has algorithms for this.

This kind of sniffing can be useful, but I don’t know if it should be in scope for this crate.

seanmonstar commented 5 years ago

Sorry, I don't mean sniffing the body bytes, just using the parse algorithm mentioned in that document.

seanmonstar commented 5 years ago

So, looking through the test cases, I noticed this as a valid MIME type:

!#$%&'+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/!#$%&'+-.^`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz;!#$%&'*+-.^ `|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Something I appreciate in the API in mime/master is the difference between MediaType and MediaRange. They allow things like text/* to be a MediaRange, but not MediaType. That combined with headers::ContentType would help prevent setting a frankly bogus content-type header (even though mimesniff says to parse it).

So I'm torn.

seanmonstar commented 5 years ago

After some more thought, the advantages of just following what the Fetch spec wants outweighs having MediaType and MediaRange splits.

So, the new plan is to remove the split, only having Mime again, and only supporting the mimesniff parsing algorithm.

nox commented 5 years ago

The closest it is to the mimesniff algorithm, the more we can make use of it.

nox commented 5 years ago

What would be useful too is a way to represent just the essence of a mime type, because many specs have prose about that.

ghostd commented 4 years ago

Hi,

Is there a way to expose the both parsers (rfc and mime-sniff)? Actually i'd like to make some servo tests pass, so i need to follow the mime-sniff algo. @SimonSapin already has implemented it in rust-url (but not officially exposed by the crate). Should i duplicate the code in servo or can i help here?

Regards