whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.18k stars 2.69k forks source link

MIME type / extension mapping #4459

Open ewilligers opened 5 years ago

ewilligers commented 5 years ago

This issue came up during Web Share Target discussions, in comments about "option 3".

With <input type="file">, and Drag and Drop, and Web Share Target, authors are able to specify the MIME types and/or file extensions that they accept.

When a web site uses a file type not already known to the user agent, there is no way to express an association between the MIME type and the file extension.

One way to specify the mapping would be within an accept string: accept="foo/bar=.foo, .svg".

Another option would be to allow a mapping within the web application manifest:

    extension_mapping: [
      { extension: ".foo", content_type: "foo/bar" },
      { extension: ".svg", content_type: "image/svg+xml" },
    ],

Then a web site could use foo/bar for file input, Drag and Drop, and Web Share Target, and the user agent would know that .foo files have this MIME type.

jakearchibald commented 5 years ago

I prefer the accept attribute solution. Putting it in the manifest creates a race condition.

domenic commented 5 years ago

I agree that the manifest seems like a big issue unless we block loading the web page on the manifest downloading.

I'm unsure why this mapping is necessary, instead of separately specifying them as all accepted: accept="foo/bar, .foo, .svg". This seems like it would give equivalent functionality for file input and drag and drop. But I may be missing something.

raymeskhoury commented 5 years ago

@domenic the issue is discussed at length here but basically the problem trying to be solved is that if developers only specify file extensions, but the platform only uses mime types (e.g. android), then their site won't work well on that platform. There are different ways of solving that, but one would be to force developers to specify a mapping such that each file extension corresponds to a mime type. There are other options listed in the bug linked above.

@jakearchibald pointed out that the problem isn't specific to Web Share Target which is why we filed the bug here.

jakearchibald commented 5 years ago

Creating a mapping allows the browser to warn on types it couldn't map automatically.

Also, it feels like it should set the type of the returned File.

domenic commented 5 years ago

Is the plan then to disallow supplying only extensions or only MIME types, and require specifying pairs?

jakearchibald commented 5 years ago

I think that would be a compat problem right?

domenic commented 5 years ago

Yeah. In which case I don't understand why adding a new feature foo/bar=.svg, which is equivalent to foo/bar, .svg, helps anything.

jakearchibald commented 5 years ago

I think there are two benefits:

  1. The correct type can be set on the resulting File.
  2. The browser can warn via console if an extension/mimetype is provided that it doesn't have an internal mapping for.

@ewilligers @raymeskhoury is the above correct?

I don't have a strong feeling on whether it's worth the change though.

domenic commented 5 years ago

Perhaps I'm not understanding.

  1. <input type=file> has never set the type on the resulting File. Is this a new feature request? If so, it seems to be for a developer convenience, right? Because the developer could always do this themselves using the filename property.
  2. Why are browsers' internal mappings involved at all? On an operating system which doesn't use MIME types, but instead only file extensions, why would such mappings even exist?
raymeskhoury commented 5 years ago

the problem trying to be solved is that if developers only specify file extensions, but the platform only uses mime types (e.g. android), then their site won't work well on that platform.

First there is a question of whether this is a problem that comes up in practice. I'm not sure it is but others may feel more strongly than I do (I think @mounirlamouri originally raised it).

Why are browsers' internal mappings involved at all?

The idea of an automatic mapping is that you could do an automatic conversion of common file extensions to mime types (or vice-versa). This would solve the problem above without developers needing to specify a mapping every time.

Because the developer could always do this themselves using the filename property.

Not necessarily - what if a file of a particular mime type was passed in that didn't have an extension. In that case, the website wouldn't be able to detect the mime type from the extension.

jakearchibald commented 5 years ago

@domenic

  1. <input type=file> has never set the type on the resulting File. Is this a new feature request?

Something sets the type, and I've seen differences between different file input methods.

https://mime-test.glitch.me/foo.jpg - this is a gif, with a .jpg extension, served with a foo/bar mimetype. Android seems to remember it's a foo/bar type once it's in the file system, so it seems like mimetypes are somewhat first class.

https://share-target-demo.glitch.me/file/ - here's a simple share target. When I share the foo/bar file to this app, it sees it as type foo/bar.

There's an input on the page that has accept="foo/bar". However, that doesn't seem to limit the Android file picker to that file. If I select the foo/bar file this way, it sees it as image/jpeg, suggesting it's using the file extension.

So two different file input methods result in a File with a different mimetype.

If so, it seems to be for a developer convenience, right? Because the developer could always do this themselves using the filename property.

I guess they could:

const fileCopy = new File(originalFile, newName, { type: newType });

So yes, I guess it's a convenience.

  1. Why are browsers' internal mappings involved at all? On an operating system which doesn't use MIME types, but instead only file extensions, why would such mappings even exist?

Mappings are already involved. Eg if you have <input type="file" accept="image/jpeg">, the browser maps that back to .jpg on extension-based filesystems.

domenic commented 5 years ago

Indeed, I'm wrong on both counts.

  1. Per spec, "the user agent must queue a task to first update the element's selected files so that it represents the user's new selection, ..." Apparently this includes a MIME type, as shown by http://software.hixie.ch/utilities/js/live-dom-viewer/?saved=6822

  2. Per spec, "User agents should prevent the user from selecting files that are not accepted by one (or more) of these tokens", which is followed by examples talking about how "the system's type registration table " can be used.

Apologies.

I still remain unclear on the benefits in https://github.com/whatwg/html/issues/4459#issuecomment-478996265. It seems like (1) would apply only on extension-based systems, where the file has an extension that is not in the registration table, and even then it would provide a minor convenience. (2) seems to apply only for MIME-type-based systems (right?), and can be done already: if someone specifies <input type=file accept=".foo"> the browser can warn that without providing MIME types, you've made it unable to properly filter files.

jakearchibald commented 5 years ago

(1) would apply only on extension-based systems, where the file has an extension that is not in the registration table, and even then it would provide a minor convenience.

Agreed.

(2) seems to apply only for MIME-type-based systems (right?), and can be done already: if someone specifies <input type=file accept=".foo"> the browser can warn that without providing MIME types, you've made it unable to properly filter files.

Consider:

<input type=file accept=".foo, image/foo">

In this case it's unclear whether the above is an extension and its equivalent mime, or an extension with a missing equivalent mime, and a mime with a missing equivalent extension.

domenic commented 5 years ago

In this case it's unclear whether the above is an extension and its equivalent mime, or an extension with a missing equivalent mime, and a mime with a missing equivalent extension.

I agree. But what does that impact?

jakearchibald commented 5 years ago

Only the browser's ability to show a console warning about platform compatibility.

@ewilligers @raymeskhoury am I underselling the benefits here? Is there anything I'm missing?

raymeskhoury commented 5 years ago

Originally this came up in the discussion of Web Share Target. In that case, because it was a new manifest attribute, there was the opportunity to actually have it fail to parse the manifest which could have made it slightly more useful (maybe).

But I agree, I think warning is the best we could do here.

mounirlamouri commented 5 years ago

the problem trying to be solved is that if developers only specify file extensions, but the platform only uses mime types (e.g. android), then their site won't work well on that platform.

First there is a question of whether this is a problem that comes up in practice. I'm not sure it is but others may feel more strongly than I do (I think @mounirlamouri originally raised it).

In the context of web share, Android requires you to give a list of mime types for intent handlers, not file extensions so it would matter. In the context of <input type='file'>, presumably, the browser can read the file extensions.

annevk commented 5 years ago

If you declare a mapping between .foo and text/foo, what does this mean?

  1. What if the file system gives you a .foo it considers a text/bar? (This seems to apply to Android.)
  2. What if the file system gives you a .bar it considers a text/foo? (This seems to apply to Android.)
  3. What if the file system gives you a .foo you consider a text/bar? (I believe this applies to browsers on some systems.)
  4. What if the file system gives you a .bar you consider a text/foo? (I believe this applies to browsers on some systems.)