XML MIME types used for documents are not interoperable

annevk commented 2 years ago

As reported at https://bugzilla.mozilla.org/show_bug.cgi?id=1717560 different browsers have different fixed XML MIME type sets (rather than adhering to the +xml convention). This can sometimes be used to exploit unaware websites. I'm not inclined to treat this as a browser security problem, but the lack of interop is a browser problem we ought to address.

Examples:

data:application/atom+xml,<script xmlns="http://www.w3.org/1999/xhtml">alert(1)</script> executes in Chrome and Safari, downloads in Firefox.
data:application/mathml+xml,<script xmlns="http://www.w3.org/1999/xhtml">alert(1)</script> executes in Firefox, downloads in Chrome and Safari.

No browser appears to support +xml in general. My inclination is that we should try to fix this and align browsers with the standard. Thoughts on that?

cc @gijsk @mfreed7 @cdumez

domenic commented 2 years ago

So in particular this is about what path we go down in step 9 of https://html.spec.whatwg.org/#process-a-navigate-response, where we have two choices:

"Not explicitly supported XML MIME type": produce an XML document, using the computed navigationParams. (Notably the origin, i.e., the resulting document could be same-origin and thus script-inspectable.)
"Explicitly supported XML MIME type": proceed onward to steps 10 and 11, i.e. either display as an opaque-origin document with custom presentation, or hand off to external application/download.

Currently the spec lets engines decide what MIME types are in the "explicitly supported" set. And that seems pretty reasonable, e.g., Firefox and Safari might have MathML, whereas Chrome would not? So I'm not sure how much interop we want to demand here...

Do we have a complete matrix of all the types we'd want to consider? Maybe we should assemble one by looking at browser source code? Then we can run some tests.

annevk commented 2 years ago

Note that per that definition it would mean that application/mathml+xml is an explicitly supported XML MIME type for Chrome and Safari (because they download), but not Firefox (because it does the normal document thing). And that all browsers have application/{random}+xml as explicitly supported XML MIME types... In general I would expect that browsers don't have explicitly supported XML MIME types. That was mainly for RSS from what I remember.

domenic commented 2 years ago

Yeah, I think the term "explicitly supported" is bad, but I think maybe it's OK to have browser-dependent behavior as to whether things are downloaded/displayed-as-plugin vs. displayed-as-potentially-same-origin XML? But I dunno, maybe we could tighten that up a bit, or flip the default.

annevk commented 2 years ago

Apart from RSS and perhaps XML formats that a native app consumes I have a hard time coming up with examples. I think I would prefer a world whereby we have a set of XML MIME types that always results in a document and all others result in a download/plugin/dispatch-to-native.

domenic commented 2 years ago

IMO best next steps are to try to determine how large the divergence is. We can do this either by code inspection or blackbox testing.

I'm trying to get help finding the Chromium code; if anyone from Mozilla or WebKit could help with those that'd also be lovely. /cc @cdumez. (Context: trying to figure out whether we can/should make browsers interoperable on which XML MIME types trigger the XML tree viewer vs. downloads vs. any other option.)

For blackbox testing, I created https://cool-massive-appendix.glitch.me/xml?contentType=application/xml , where you can change the contentType parameter. Remember to encode + as %2B. Basic results so far:

application/xml: tree viewer everywhere
text/xml: tree viewer everywhere
application/foo+xml: download everywhere
application/rss+xml: download in Firefox; text document (!) in Chrome and Safari

domenic commented 2 years ago

@jeremyroman found some of the relevant Chromium code for me:

MIME sniffing
- Any text/* except text/html, text/xml, and text/xsl will end up as a text document. So e.g. text/foo+xml is a text document.
- Otherwise we use IsXMLMimeType which has special cases for text/xml, text/xsl, application/xml, and then uses a complicated RFC-based parser to count certain x/y+xml MIME types as XML.
XML tree viewer vs. not
- If there's an XSL transform, use that
- If there's no error (presumably an XML parsing error?) and no CSS and it's not SVG and we didn't see elements in "known namespaces", then use XML viewer mode
- Otherwise... not sure exactly what the fallback is, but I suspect it could spit out either text, HTML, or SVG documents.

None of this yet explains why some cases end up downloaded. I suspect that happens in earlier code.

smaug---- commented 2 years ago

Not including media and image types, this is what Gecko does:

(1) (X)HTML documents: text/html, application/x-view-source, application/xhtml+xml, application/vnd.wap.xhtml+xml (application/x-view-source is converted internally to text/plain)

(2) XML document: text/xml, application/xml, application/mathml+xml, application/rdf+xml, text/rdf (image/svg+xml is handled similarly to items in this group when loaded as a document.)

(3) Plain text: text/plain, text/css, text/cache-manifest, text/vtt, application/javascript, application/x-javascript, text/ecmascript, application/ecmascript, text/javascript, application/json (gets json viewer), text/json

(1) and (2) may execute scripts (except view-source which is converted to text/plain)

Random text/foo or text/foo+xml is downloaded.

XML viewer mode is used if the document is parsed as an XHTML/XML document and doesn't have XHTML nor SVG elements and there isn't a style (css or xslt) link from the header nor from a processing instruction.

josepharhar commented 2 years ago

I made a test page with all of the mime types of the last comment thrown into iframes, and there are definitely some differences between what the browsers render: https://volcano-raspy-lead.glitch.me/ WebKit and chromium seem to be mostly similar. WebKit and chromium's XML tree viewers don't run in iframes though :(

To add to domenic's comment, chromium and webkit also have behavior where application/rss+xml and application/atom+xml get converted to text/plain very early in the network stack due to a security bug from 2009. There is a chrome bug with a high number of stars to get rid of this behavior and allow these mime types to be rendered with the XML tree viewer.

Here is a table for some mime types ending with +xml	content-type	Firefox	Chromium
application/rss+xml	download	plain text	plain text
application/atom+xml	download	plain text	plain text
application/mathml+xml	execute	download	download
application/foo+xml	download	download	download

None of this yet explains why some cases end up downloaded. I suspect that happens in earlier code.

Yeah I'm not sure where that code in chromium is either.

Another thing worth considering that I found while trying to address the high star open chrome bug is that the XML tree viewer in chromium actually executes the XML file before rendering the tree viewer, so I guess that for mime types which we believe are a security issue the only options are render as plain text and download...?

josepharhar commented 2 years ago

Here is a more exhaustive table of test cases. I'm not going to look at whether the XML tree viewer is opened in this analysis, just executed vs plain text vs download when navigating directly to an XHTML document.

content-types with non-interoperable behavior: content-type	firefox	chromium	webkit
application/rss+xml	download	plain text	plain text
application/atom+xml	download	plain text	plain text
application/mathml+xml	execute	download	download
application/x-view-source	plain text?	download	download
application/vnd.wap.xhtml+xml	execute	download	execute
application/rdf+xml	execute	download	download
text/rdf	execute	download	download
text/foo	download	plain text	plain text
text/foo+xml	download	plain text	plain text

content-types with interoperable behavior: content-type	behavior
application/foo+xml	download
application/xml	execute
text/plain	plain text
text/html	execute
application/xhtml+xml	execute
text/xml	execute
image/svg+xml	execute
text/css	plain text
text/cache-manifest	plain text
text/vtt	plain text
application/javascript	plain text
application/x-javascript	plain text
text/ecmascript	plain text
application/ecmascript	plain text
text/javascript	plain text
application/json	plain text
text/json	plain text

whatwg / html

XML MIME types used for documents are not interoperable #7420