Open Manishearth opened 7 years ago
See also #13
FWIW, Chrome throws on non-ASCII in the ctor, e.g.
new Blob([], {type: '\x80'})
I've had a local patch to align with the spec (don't throw, just replace type with empty string if it contains anything outsize U+0020 ... U+007E, and have getter return empty on nonparsable MIME type) but I haven't landed because I agree the spec seems weird (and no-one else implements it as written)
I think the two behaviors that would make sense to me are:
.type
is ignored if not valid when the Blob is sent over the network as part of a FormData
..type
is a valid mimetype. Including that it is restricted to U+0020 ... U+007E, and that it parses as a valid mimetype.I really don't care strongly though.
FWIW, Safari follows the Chrome behavior (allow bad mime types, restrict characters), and IE follows the Firefox behavior (allow all the things)
I think it would be nicer if we made parsed and serialized. If characters are already restricted it seems there's some leeway here to fix this.
I also think we should allow parameters by the way, since some MIME types do not work without them.
@inexorabletash @mkruisselbrink @yutakahirano can we try to make some progress here? Not having a good story for Blob
's type
basically means MIME types cannot be tested to the extent they deserve: https://github.com/whatwg/mimesniff/issues/42.
@pwnall I heard you might be able to help here as well?
FWIW, I made Chrome align (hopefully) with Firefox about 9 months ago: https://chromium.googlesource.com/chromium/src/+/0b46f3a24bf36d7cf91ec2d88132d15563b85d0e - no more character restrictions on input.
(Also FWIW, it should now be less objectionable to implement this in Blink since it won't involve adding a second MIME parser thanks to internal code reorganization.)
What I want is that a Blob
holds a MIME type in an internal slot and type
returns that serialized. And to be clear, that MIME type can have parameters as well, otherwise there's severe issues for multipart/*
(needs boundary
) and most MIME types that need charset
.
And then on the input side, if a MIME type cannot be parsed out of the input, we use application/octet-stream
or some such as replacement.
I'm willing to write all the tests for this if we can agree on this model.
I don't really know enough of the history here (or where these mime types are used outside of the FileAPI spec), but what @sicking said in his earlier comment makes sense to me: either allow anything and just return it back (except maybe in some cases where it is sent over the network), or only allow valid mime types (which I think is what the spec is already trying to say?).
Not sure how @annevk's suggestion lines up with that? You're saying the constructor should only accept valid mime types, but rather than using empty string for invalid input you'd like us to use some arbitrary other mime type as replacement?
I don't like APIs that silently ignore input (which seems to be how invalid mime types are treated, either by replacing with empty string or replacing with some arbitrary replacement), so I'd lean slightly towards just allowing any string and not doing any validation, but presumably there were good reasons why the valid mime type constraints were introduced in the first place? It's unfortunate that we probably can't get away with throwing on invalid mime types...
Using the empty string (which indicates a missing MIME type) is fine. I didn't mean to suggest changing that (if someone wants to suggest that anyway, let's do that in a separate issue).
I think if we parse and then serialize we create less trouble for downstream consumers of these objects (e.g., consumers who might have parsers that could easily be broken by malicious input). And these objects would then also offer a consistent API with when the browser is responsible for generating them as the MIME type would always be valid.
@annevk Sorry I didn't see your ping until now.
Here's the history that I know -- at least one implementation (WebKit) used to dump the Blob MIME type directly into a HTTP request string when sent via XHR's send(Blob) API or as FormData (I don't remember which case is covered). IIRC, when we fixed that bug in WebKit, we got the spec tightened to only allow characters in \x20-\x7F. We dropped MIME type silently, as opposed to throwing an exception, to reduce the probability of breaking existing content.
FWIW, no matter what we do, we'll need some way to spec what happens in the above cases. Some characters are dangerous (hard to reason about) if used in HTTP headers (newlines, quotes, colons). Rather than creating a footgun that downstream specs and implementations need to handle on a case-by-case basis, I'd prefer that we keep the current spec behavior or tighten it further by having the Blob constructor throw exceptions.
I'm happy to help write (non-normative?) text explaining the need for the charset restriction, or to help adjusting the Blink implementation to whatever is decided here.
https://github.com/whatwg/mimesniff/pull/36 is my new specification for parsing MIME types. Any MIME type record (as defined there) serialized will be HTTP header safe.
So I think we're in agreement, other than on throwing. I strongly suspect throwing is not compatible, but I won't stop you if you want to try.
Agreed that throwing unfortunately isn't likely to be compatible. So what you're suggesting sounds good to me.
I have created tests here: https://github.com/w3c/web-platform-tests/pull/7764 (in particular the parsing.any.js resource).
Note that the tests have landed. What is still needed here is that we update the specification. And maybe file dedicated browser bugs for making use of the MIME type parser for Blob
object constructors.
As I pointed out in #170, the current spec is broken in more ways than you might think, since parameter values in MIME types can be non-ASCII (see whatwg/mimesniff#141), and they also shouldn't be assumed to have the same meaning when lowercased (see whatwg/html#6251). If we just settle on parsing and then serializing, we wouldn't have this kind of problem with unnecessarily limiting valid MIME types.
The following code:
works in Chrome and Firefox. It also works if you replace
(
with\x02
or somethingThe spec, however, asks user agents to restrict values in the range
U+0020 to U+007E
in the constructor, and in the getter it only allows a nonempty value to be returned if it is a parseable MIME type.Perhaps we should relax these restrictions in the spec?