whatwg / fetch

Fetch Standard
https://fetch.spec.whatwg.org/
Other
2.1k stars 326 forks source link

Restrict allowed JavaScript MIME types #870

Open evilpie opened 5 years ago

evilpie commented 5 years ago

I am cautiously optimistic that we can change the allowed JavaScript MIME types from a block to an allow list.

This list would include all the JavaScript MIME types, plus text/html, application/json, text/plain and empty (no Content-Type).

MIME Loads %
javaScript 9723904447 95.45%
text_html 240640161 2.36%
empty 79707178 0.78%
app_json 77716915 0.76%
text_plain 44977157 0.44%
unknown 8032881 0.08%
image 6772345 0.07%
app_octet_stream 4899410 0.05%
app_xml 787319 0.01%
text_json 440959 0.00%
text_xml 37279 0.00%
audio 7459 0.00%
video 61 0.00%
text_csv 0 0.00%
  10187923571  

Source: https://mzl.la/2SxxvNw

Note: that we already block image/, which has almost the same percentage as unknown, which includes all not explicitly enumerated MIME types.

@annevk @mikewest

dveditz commented 5 years ago

I wonder if there's anything we can do to get all those text/html ones down.

mikewest commented 5 years ago

Chrome's numbers look a bit different:

Cross-origin scripts

MIME % of page views
text/html ~10%
text/plain ~4%
application/octet-stream ~1%
application/xml ~1%
Other ~25%

Same-origin scripts

MIME % of page views
text/html ~2%
text/plain ~0.3%
application/octet-stream ~0.05%
application/xml ~0.01%
Other ~3%

We might just be measuring different things. It looks like Mozilla's metrics use the number of scripts loaded as the denominator, while Chrome is measuring the number of pages on which any script had the given MIME type?

evilpie commented 5 years ago

Other ~25%

That number is incredibly high. Sadly you don't seem to count application/json? Would this also include no Content-Type?

Would you assume that breaking cross-origin scripts would usually be less of a problem, assuming that a lot of those are tracking scripts?

We might just be measuring different things. It looks like Mozilla's metrics use the number of scripts loaded as the denominator, while Chrome is measuring the number of pages on which any script had the given MIME type?

Yes correct, this counts every script load. Actually this number also includes ServiceWorker, Worker etc., but those numbers are so small compared to normal <script> loads that they are probably insignificant. We could add more counters later.

I am still surprised that the difference seems so high, but I don't have a good intuition on how those two measurements compare.

mikewest commented 5 years ago

That number is incredibly high. Sadly you don't seem to count application/json? Would this also include no Content-Type?

Yes. "Other" is everything else, including application/json and the empty string.

Would you assume that breaking cross-origin scripts would usually be less of a problem, assuming that a lot of those are tracking scripts?

Yes, that's exactly my intuition. Hence the separate metrics. :)

I am still surprised that the difference seems so high, but I don't have a good intuition on how those two measurements compare.

I can imagine that Chromium's page-views-based number would look much higher than Mozilla's script-load-based number if there are a small number of very widely used scripts with incorrect MIME types. Facebook was in this category, as is VK, and a zillion ad scripts.

I think it's worth experimenting in this direction, and explicitly allowing text/html and application/json probably takes care of a large chunk of the potential breakage, but I think it'll be necessary to do some more research before I'd be able to convince Blink folks to ship this kind of change.

annevk commented 5 years ago

It seems that even if we figure this out https://github.com/whatwg/fetch/issues/721#issuecomment-470126129 (CORB++) will still be needed due to text/html and JSON being so prominent, but depending on the exact shape of this it might make for a simpler check there.