whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.19k stars 2.71k forks source link

XML MIME types used for documents are not interoperable #7420

Open annevk opened 2 years ago

annevk commented 2 years ago

As reported at https://bugzilla.mozilla.org/show_bug.cgi?id=1717560 different browsers have different fixed XML MIME type sets (rather than adhering to the +xml convention). This can sometimes be used to exploit unaware websites. I'm not inclined to treat this as a browser security problem, but the lack of interop is a browser problem we ought to address.

Examples:

No browser appears to support +xml in general. My inclination is that we should try to fix this and align browsers with the standard. Thoughts on that?

cc @gijsk @mfreed7 @cdumez

domenic commented 2 years ago

So in particular this is about what path we go down in step 9 of https://html.spec.whatwg.org/#process-a-navigate-response, where we have two choices:

  1. "Not explicitly supported XML MIME type": produce an XML document, using the computed navigationParams. (Notably the origin, i.e., the resulting document could be same-origin and thus script-inspectable.)
  2. "Explicitly supported XML MIME type": proceed onward to steps 10 and 11, i.e. either display as an opaque-origin document with custom presentation, or hand off to external application/download.

Currently the spec lets engines decide what MIME types are in the "explicitly supported" set. And that seems pretty reasonable, e.g., Firefox and Safari might have MathML, whereas Chrome would not? So I'm not sure how much interop we want to demand here...

Do we have a complete matrix of all the types we'd want to consider? Maybe we should assemble one by looking at browser source code? Then we can run some tests.

annevk commented 2 years ago

Note that per that definition it would mean that application/mathml+xml is an explicitly supported XML MIME type for Chrome and Safari (because they download), but not Firefox (because it does the normal document thing). And that all browsers have application/{random}+xml as explicitly supported XML MIME types... In general I would expect that browsers don't have explicitly supported XML MIME types. That was mainly for RSS from what I remember.

domenic commented 2 years ago

Yeah, I think the term "explicitly supported" is bad, but I think maybe it's OK to have browser-dependent behavior as to whether things are downloaded/displayed-as-plugin vs. displayed-as-potentially-same-origin XML? But I dunno, maybe we could tighten that up a bit, or flip the default.

annevk commented 2 years ago

Apart from RSS and perhaps XML formats that a native app consumes I have a hard time coming up with examples. I think I would prefer a world whereby we have a set of XML MIME types that always results in a document and all others result in a download/plugin/dispatch-to-native.

domenic commented 2 years ago

IMO best next steps are to try to determine how large the divergence is. We can do this either by code inspection or blackbox testing.

I'm trying to get help finding the Chromium code; if anyone from Mozilla or WebKit could help with those that'd also be lovely. /cc @cdumez. (Context: trying to figure out whether we can/should make browsers interoperable on which XML MIME types trigger the XML tree viewer vs. downloads vs. any other option.)

For blackbox testing, I created https://cool-massive-appendix.glitch.me/xml?contentType=application/xml , where you can change the contentType parameter. Remember to encode + as %2B. Basic results so far:

domenic commented 2 years ago

@jeremyroman found some of the relevant Chromium code for me:

None of this yet explains why some cases end up downloaded. I suspect that happens in earlier code.

smaug---- commented 2 years ago

Not including media and image types, this is what Gecko does:

(1) (X)HTML documents: text/html, application/x-view-source, application/xhtml+xml, application/vnd.wap.xhtml+xml (application/x-view-source is converted internally to text/plain)

(2) XML document: text/xml, application/xml, application/mathml+xml, application/rdf+xml, text/rdf (image/svg+xml is handled similarly to items in this group when loaded as a document.)

(3) Plain text: text/plain, text/css, text/cache-manifest, text/vtt, application/javascript, application/x-javascript, text/ecmascript, application/ecmascript, text/javascript, application/json (gets json viewer), text/json

(1) and (2) may execute scripts (except view-source which is converted to text/plain)

Random text/foo or text/foo+xml is downloaded.

XML viewer mode is used if the document is parsed as an XHTML/XML document and doesn't have XHTML nor SVG elements and there isn't a style (css or xslt) link from the header nor from a processing instruction.

josepharhar commented 2 years ago

I made a test page with all of the mime types of the last comment thrown into iframes, and there are definitely some differences between what the browsers render: https://volcano-raspy-lead.glitch.me/ WebKit and chromium seem to be mostly similar. WebKit and chromium's XML tree viewers don't run in iframes though :(

To add to domenic's comment, chromium and webkit also have behavior where application/rss+xml and application/atom+xml get converted to text/plain very early in the network stack due to a security bug from 2009. There is a chrome bug with a high number of stars to get rid of this behavior and allow these mime types to be rendered with the XML tree viewer.

Here is a table for some mime types ending with +xml content-type Firefox Chromium WebKit
application/rss+xml download plain text plain text
application/atom+xml download plain text plain text
application/mathml+xml execute download download
application/foo+xml download download download

None of this yet explains why some cases end up downloaded. I suspect that happens in earlier code.

Yeah I'm not sure where that code in chromium is either.

Another thing worth considering that I found while trying to address the high star open chrome bug is that the XML tree viewer in chromium actually executes the XML file before rendering the tree viewer, so I guess that for mime types which we believe are a security issue the only options are render as plain text and download...?

josepharhar commented 2 years ago

Here is a more exhaustive table of test cases. I'm not going to look at whether the XML tree viewer is opened in this analysis, just executed vs plain text vs download when navigating directly to an XHTML document.

content-types with non-interoperable behavior: content-type firefox chromium webkit
application/rss+xml download plain text plain text
application/atom+xml download plain text plain text
application/mathml+xml execute download download
application/x-view-source plain text? download download
application/vnd.wap.xhtml+xml execute download execute
application/rdf+xml execute download download
text/rdf execute download download
text/foo download plain text plain text
text/foo+xml download plain text plain text
content-types with interoperable behavior: content-type behavior
application/foo+xml download
application/xml execute
text/plain plain text
text/html execute
application/xhtml+xml execute
text/xml execute
image/svg+xml execute
text/css plain text
text/cache-manifest plain text
text/vtt plain text
application/javascript plain text
application/x-javascript plain text
text/ecmascript plain text
application/ecmascript plain text
text/javascript plain text
application/json plain text
text/json plain text