whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
7.86k stars 2.58k forks source link

Consider restricting form submissions containing unclosed `<textarea>` or `<select>` elements. #2253

Open mikewest opened 7 years ago

mikewest commented 7 years ago

HTML's parsing mechanism will automatically close <form>, <textarea>, <option>, <button> elements at the end of a file. This is fine from a parsing perspective, but the behavior does enable dangling markup attacks, such as those described in http://www.thespanner.co.uk/2011/12/21/html-scriptless-attacks/ and section 2 of http://lcamtuf.coredump.cx/postxss/.

I haven't added metrics to Chrome yet, and regexing this kind of data out of HTTPArchive is difficult, but my intuition is that we wouldn't break legitimate form submissions if we added a flag to elements noting whether they were in the stack of open elements during step 2 of https://html.spec.whatwg.org/#the-end, and prevented form submission (in the same way we decide on for #2252) if that flag was present on any of the form's submittable elements.

This change seems relatively low-risk, and would address a subset of dangling markup attacks that don't rely on a closing tag being present somewhere in the document.

@arturjanc: This is part of what we talked about yesterday.

@fmarier, @freddyb, @bzbarsky, @johnwilander, @teddink: Would y'all be interested in making this kind of change? Do other idea occur to y'all?

eligrey commented 7 years ago

I do not want to spend more of my CPU cycles performing prophylactic security for third parties, which should have been handled by those parties themselves.

mikewest commented 7 years ago

@eligrey: In an ideal world, I agree with you.

zcorpan commented 7 years ago

The parser can close these elements without their end tag sooner than The End. But adding a flag when it happens seems doable...

mikewest commented 7 years ago

The parser can close these elements without their end tag sooner than The End

Yeah. I'm discovering that while poking at a toy implementation. Parsing is complicated. :)

bzbarsky commented 7 years ago

I would be pretty worried about compat here, esp. for unclosed <form>.

Again, paging @hsivonen. And @smaug----.

mikewest commented 7 years ago

@bzbarsky: I agree about <form>, and I'm actually not that concerned about submitting a form that hasn't been closed; I'm more concerned about submitting a text sink (like those mentioned above) that's implicitly closed by the end of the file.

hsivonen commented 7 years ago

On a general level, I'm worried on security grounds about keeping changing the HTML parsing algorithm. If the server and the browser implement a different spec snapshot (libraries deployed in server-side software don't update every six weeks even if their upstream repos did), the server can parse something and decide it's OK and the browser may parse it into something different. (Depending on the parser change, this might apply even if the server parses and serializes, which the server should do.)

For that reason, I'm worried about trying to keep the language design consistent by adding special rules for new elements (e.g. menuitem) or by adding radical parsing features for convenience (e.g. certain HTML elements as children of SVG elements without a foreignObject in between. This is the most dangerous thing being proposed, AFAICT.)

CC @wycats, @annevk.

That said, I'm less worried about marking some DOM nodes broken but keeping the tree the same. We already did this for script elements as part of the initial standardization of the parsing algorithm. Still, it annoys me to keep changing browsers to whack-a-mole security problems on sites that don't do things the safest way (not letting any third-party markup get in the output without going through a server-side parser and then through a server-side serializer).

Of the elements at hand, marking textarea broken seems the least problematic: The only way to close a textarea without an end tag is by reaching the eof.

button worries me in terms of Web compat: It's quite reasonable to postulate a legitimate site that in its own markup has accidentally omitted the button end tag but site works the way its authors and users expect it to, because the button gets closed by a later </div> or something of that nature.

Aside: Does the mere typical pattern of wrapping user-supplied in a div foil the button attack then? Not really. This whole issue is premised on the browser trying to counteract the incompetence of the site (not running user-supplied content through a filter that parses the user-supplied content, drops what's not whitelisted from the tree and serialized the pruned tree), so we might as well postulate further incompetence like not filtering out <div> start tags, either, in which case the attacker could add a suitable number of <div> start tags to keep the site-supplied end tags from closing the button.

option is even supposed to to work without an end tag, so making option not work if unclosed seems bad. But you need a select to submit an option. Is autoclosed option even the right thing to consider? (As opposed to unclosed select?)

Is there value in addressing some of these if we don't address all of them? For example, would we gain anything if we marked unclosed textarea as unsubmittable but didn't do the same for button and select?

mikewest commented 7 years ago

Thanks, @hsivonen!

That said, I'm less worried about marking some DOM nodes broken but keeping the tree the same.

I agree with your general concerns about changing the parser. Though I think there are some parser changes that would effect tree-building that are reasonable to explore, you're right to say that they're best avoided. That said, I don't believe that the change suggested here is dangerous in this way. As you note, I'm suggesting that we set a flag on an element, not remove it from the tree.

Still, it annoys me to keep changing browsers to whack-a-mole security problems on sites that don't do things the safest way.

A counterpoint is that we've given developers a number of footguns by ensuring that the parser is fairly lax in what it accepts. Developers absolutely should be doing a better job escaping content they output, but I don't think that in itself is justification for maintaining the status quo. If there are small changes we can make that improve the defaults in a small way, then it seems reasonable to weigh those changes against the compatibility risk they introduce.

Of the elements at hand, marking textarea broken seems the least problematic

I agree, and it's also the one I'm most interested in addressing, as <select><option> doesn't actually eat element attribute values. That is, <textarea><script nonce='sekrit'> will slurp up the secret, whereas <select><option><script nonce='sekrit'> will execute the script. It's still possible to grab interesting data with <option>, but <textarea> is a bigger problem.

button worries me in terms of Web compat.

I'm less concerned about button than the others, as button doesn't actually submit the data it captures. That is, given <button name=yay><xmp><script nonce='sekrit'>, the button consumes the rest of the page, but clicking it only submits yay with no value.

A variant worries me a bit more, but is also harder to address:

<form method="post" action="https://example.com/">
  <button type="submit" name="yay" value='
    <script nonce=yay></script>
    I'm totally happy yay!
    <div></div>

I have some ideas for this general class of dangling markup injection, but this bug doesn't intend to address them. We can argue about how terrible various mitigations might be later. :)

Is autoclosed option even the right thing to consider? (As opposed to unclosed select?)

They're both required, so I'm fine poking at either. Would you prefer to flag the latter rather than the former?

so we might as well postulate further incompetence

This is generally a safe thing to do. Take Blink's behavior uncovered in https://github.com/whatwg/html/issues/2252, for instance. :)

Is there value in addressing some of these if we don't address all of them?

As noted above, I'd suggest that the difference in capabilities makes it most interesting to poke at <textarea>, and even if we decide not to make changes to other elements, we'd still have reduced the attack surface. I'd prefer to go further, of course. :)

hsivonen commented 7 years ago

button doesn't actually submit the data it captures

I suggest we don't change button, then.

Is autoclosed option even the right thing to consider? (As opposed to unclosed select?)

They're both required, so I'm fine poking at either. Would you prefer to flag the latter rather than the former?

Yes.

Without having spent too much time to ponder this, I think it might be OK to do this:

@smaug----, thoughts?

hsivonen commented 7 years ago

Fun stuff to spec: Does the "do not submit" flag get cloned if the node is cloned?

mikewest commented 7 years ago
  • Parser-created textarea and select nodes are flagged as "do not submit".
  • When the parser processes the end tag for textarea or select, it unsets the "do not submit" flag on the corresponding node.

This seems reasonable.

  • Form submission fails in some defined way if there are "do not submit" form controls

I wonder if reusing the invalid event (effectively treating this as an additional validation step) might be a reasonable approach.

zcorpan commented 7 years ago

No, that can be bypassed with novalidate.

mikewest commented 7 years ago

No, that can be bypassed with novalidate.

I was more interested in reusing the invalid event to inform developers that something's gone wrong than in matching validation behavior. I agree that we wouldn't want novalidate to have an effect.

zcorpan commented 7 years ago

Ah ok. I'm still not sure it makes sense to use that event... An 'error' event seems more appropriate. 'invalid' is something the user needs to fix in the form fields before the form can be submitted. Informing developers about problems should be exceptions, error events, console, etc.

mikewest commented 7 years ago

Sure, I'm happy with error. I'm putting together a prototype in Blink that we can use to collect some metrics about how often this would trigger... Let's see what the numbers look like.

mikewest commented 7 years ago

Fun stuff to spec: Does the "do not submit" flag get cloned if the node is cloned?

I missed this, sorry @hsivonen.

For the specific use-case I'm concerned about, cloning the node seems like an explicit-enough acceptance of its role in the document that I'd be fine allowing it to be used in form submission. Restricting the new behavior to parser-inserted elements seems like a reasonable tradeoff between annoying developers and removing footguns.

smaug---- commented 7 years ago

I'm ok with https://github.com/whatwg/html/issues/2253#issuecomment-272150766 (though I still wonder if we're fixing any real world issue here or only theoretical issues. Need to be careful with adding more special cases; making the platform more complicated makes it harder to maintain and understand)

mikewest commented 7 years ago

@smaug----: Dangling markup is a real world issue. Basically, it's the next softest target after we kill XSS (http://lcamtuf.coredump.cx/postxss/ is a pretty good summary). Particularly clever developers are doing their best to defend against it today (see GitHub's <form> element prefix, for example), but it seems like a reasonable spot in the platform to harden.

I agree that adding complexity to the platform by tightening some of the processing rules is a dangerous thing to do in general; I believe, however, that a few targeted changes could substantially mitigate a class of attack. That seems like a good tradeoff to me. Let's see if the metrics I'm adding to Chrome end up showing the kinds of results I expect.

DemiMarie commented 6 years ago

I am honestly not too concerned about changing the parsing algorithm. Every decent sanitizer that I know of parses the document into a tree, filters the tree, and finally serializes the tree. The serialization is invariably into a canonicalized form.

That said, I do think that this is an argument for websites to use the XML serialization of HTML wherever possible, since its parsing algorithm is 100% set in stone.

westonruter commented 4 months ago

I just encountered this issue with #enable-experimental-web-platform-features being enabled in Chromium. It apparently breaks form submissions in Trac Custom Queries. See https://trac.edgewall.org/ticket/13141 and https://meta.trac.wordpress.org/ticket/7459

Chromium issue: https://bugs.chromium.org/p/chromium/issues/detail?id=947878

I would get an error like:

Form submission failed, as the <SELECT> element named '0_description_mode' was implicitly closed by reaching the end of the file. Please add an explicit end tag ('</SELECT>')

I was able to fix the error by supplying the missing </select> in the relevant file:

-     var e = $($.htmlFormat('<select name="$1">', name));
+     var e = $($.htmlFormat('<select name="$1"></select>', name));
mikewest commented 4 months ago

Thanks for the reminder about this discussion. The tickets you've referenced are helpful as well.

It looks like form submissions with unclosed <textarea> and/or <select> elements occur on ~0.002% of Chromium page views: https://chromestatus.com/metrics/feature/timeline/popularity/1767. That seems low enough to make deprecation plausible, but not easy. This metric doesn't split out <textarea> and <select>, though. It's likely that one or the other is less popular and could be changed more easily.