w3c / FileAPI

File API
https://w3c.github.io/FileAPI/
Other
104 stars 44 forks source link

Implementations allow all values in type getter #43

Open Manishearth opened 7 years ago

Manishearth commented 7 years ago

The following code:

a = new Blob([1,2,3], {type: "text/(plain"});
console.log(a.type) // prints text/(plain

works in Chrome and Firefox. It also works if you replace ( with \x02 or something

The spec, however, asks user agents to restrict values in the range U+0020 to U+007E in the constructor, and in the getter it only allows a nonempty value to be returned if it is a parseable MIME type.

Perhaps we should relax these restrictions in the spec?

inexorabletash commented 7 years ago

See also #13

FWIW, Chrome throws on non-ASCII in the ctor, e.g.

new Blob([], {type: '\x80'})

I've had a local patch to align with the spec (don't throw, just replace type with empty string if it contains anything outsize U+0020 ... U+007E, and have getter return empty on nonparsable MIME type) but I haven't landed because I agree the spec seems weird (and no-one else implements it as written)

sicking commented 7 years ago

I think the two behaviors that would make sense to me are:

I really don't care strongly though.

Manishearth commented 7 years ago

FWIW, Safari follows the Chrome behavior (allow bad mime types, restrict characters), and IE follows the Firefox behavior (allow all the things)

annevk commented 6 years ago

I think it would be nicer if we made parsed and serialized. If characters are already restricted it seems there's some leeway here to fix this.

See also https://github.com/whatwg/mimesniff/issues/30.

annevk commented 6 years ago

I also think we should allow parameters by the way, since some MIME types do not work without them.

annevk commented 6 years ago

@inexorabletash @mkruisselbrink @yutakahirano can we try to make some progress here? Not having a good story for Blob's type basically means MIME types cannot be tested to the extent they deserve: https://github.com/whatwg/mimesniff/issues/42.

annevk commented 6 years ago

@pwnall I heard you might be able to help here as well?

inexorabletash commented 6 years ago

FWIW, I made Chrome align (hopefully) with Firefox about 9 months ago: https://chromium.googlesource.com/chromium/src/+/0b46f3a24bf36d7cf91ec2d88132d15563b85d0e - no more character restrictions on input.

(Also FWIW, it should now be less objectionable to implement this in Blink since it won't involve adding a second MIME parser thanks to internal code reorganization.)

annevk commented 6 years ago

What I want is that a Blob holds a MIME type in an internal slot and type returns that serialized. And to be clear, that MIME type can have parameters as well, otherwise there's severe issues for multipart/* (needs boundary) and most MIME types that need charset.

And then on the input side, if a MIME type cannot be parsed out of the input, we use application/octet-stream or some such as replacement.

I'm willing to write all the tests for this if we can agree on this model.

mkruisselbrink commented 6 years ago

I don't really know enough of the history here (or where these mime types are used outside of the FileAPI spec), but what @sicking said in his earlier comment makes sense to me: either allow anything and just return it back (except maybe in some cases where it is sent over the network), or only allow valid mime types (which I think is what the spec is already trying to say?).

Not sure how @annevk's suggestion lines up with that? You're saying the constructor should only accept valid mime types, but rather than using empty string for invalid input you'd like us to use some arbitrary other mime type as replacement?

I don't like APIs that silently ignore input (which seems to be how invalid mime types are treated, either by replacing with empty string or replacing with some arbitrary replacement), so I'd lean slightly towards just allowing any string and not doing any validation, but presumably there were good reasons why the valid mime type constraints were introduced in the first place? It's unfortunate that we probably can't get away with throwing on invalid mime types...

annevk commented 6 years ago

Using the empty string (which indicates a missing MIME type) is fine. I didn't mean to suggest changing that (if someone wants to suggest that anyway, let's do that in a separate issue).

I think if we parse and then serialize we create less trouble for downstream consumers of these objects (e.g., consumers who might have parsers that could easily be broken by malicious input). And these objects would then also offer a consistent API with when the browser is responsible for generating them as the MIME type would always be valid.

pwnall commented 6 years ago

@annevk Sorry I didn't see your ping until now.

Here's the history that I know -- at least one implementation (WebKit) used to dump the Blob MIME type directly into a HTTP request string when sent via XHR's send(Blob) API or as FormData (I don't remember which case is covered). IIRC, when we fixed that bug in WebKit, we got the spec tightened to only allow characters in \x20-\x7F. We dropped MIME type silently, as opposed to throwing an exception, to reduce the probability of breaking existing content.

FWIW, no matter what we do, we'll need some way to spec what happens in the above cases. Some characters are dangerous (hard to reason about) if used in HTTP headers (newlines, quotes, colons). Rather than creating a footgun that downstream specs and implementations need to handle on a case-by-case basis, I'd prefer that we keep the current spec behavior or tighten it further by having the Blob constructor throw exceptions.

I'm happy to help write (non-normative?) text explaining the need for the charset restriction, or to help adjusting the Blink implementation to whatever is decided here.

annevk commented 6 years ago

https://github.com/whatwg/mimesniff/pull/36 is my new specification for parsing MIME types. Any MIME type record (as defined there) serialized will be HTTP header safe.

So I think we're in agreement, other than on throwing. I strongly suspect throwing is not compatible, but I won't stop you if you want to try.

mkruisselbrink commented 6 years ago

Agreed that throwing unfortunately isn't likely to be compatible. So what you're suggesting sounds good to me.

annevk commented 6 years ago

I have created tests here: https://github.com/w3c/web-platform-tests/pull/7764 (in particular the parsing.any.js resource).

annevk commented 6 years ago

Note that the tests have landed. What is still needed here is that we update the specification. And maybe file dedicated browser bugs for making use of the MIME type parser for Blob object constructors.

andreubotella commented 3 years ago

As I pointed out in #170, the current spec is broken in more ways than you might think, since parameter values in MIME types can be non-ASCII (see whatwg/mimesniff#141), and they also shouldn't be assumed to have the same meaning when lowercased (see whatwg/html#6251). If we just settle on parsing and then serializing, we wouldn't have this kind of problem with unnecessarily limiting valid MIME types.