serde-rs / json

Strongly typed JSON library for Rust
Apache License 2.0
4.85k stars 555 forks source link

Document that serde_json doesn't escape forward slashes #537

Open KamilaBorowska opened 5 years ago

KamilaBorowska commented 5 years ago

An user should manually escape the slash (for instance using https://github.com/serde-rs/json/issues/435#issuecomment-382108021) when putting JSON inside of <script> (yes, people do that), otherwise an XSS will occur. This ideally would be documented, as it is a potential security vulnerability.

andy128k commented 5 years ago

Can we have one more formatter which also escapes forward slashes (and possibly also <, >, &)?

KamilaBorowska commented 5 years ago

I would argue that escaping <, > and & is a responsibility of an HTML escape library, and should be a rare use-case, I'm thinking about an use of JSON in <script>, where those characters are non-issue and shouldn't be escaped.

webmaster128 commented 4 years ago

Why would you want to escape forward slashes? They can only occur in JSON strings where they are perfectly valid JSON (and JavaScript, right?):

{ "foo/bar": "spam //////////////////"}

But there is a different danger when doing this kind of embedding: there are unicode characters that are valid in JSON strings but not in JS strings: https://twitter.com/simon_warta/status/1255505650257362944

dtolnay commented 4 years ago

@webmaster128: https://stackoverflow.com/a/1580682/6086311

andy128k commented 4 years ago

To use serde_json in such scenarios, I had created alternative formatter https://github.com/xd009642/tarpaulin/blob/master/src/report/safe_json.rs

It would be great if such formatter is available out of the box.

KamilaBorowska commented 4 years ago

@webmaster128 U+2028 and U+2029 are now allowed in JavaScript strings, however fair point in that it will only work on relatively modern web browsers (Chrome 66+, Edge 79+, Firefox 62+, Opera 53+, Safari 12+). This may or may not be okay.

Escaping forward slashes is necessary if a JSON string is embedded in <script> tag and happens to contain </script>. Not escaping a forward slash will cause an XSS vulnerability.

And researching more about that, <script> and <!-- can also cause issues, so maybe the documentation should specifically discourage storing JSON in <script>. The HTML specification has a rather complicated recommendation on avoiding the issue: https://html.spec.whatwg.org/multipage/scripting.html#restrictions-for-contents-of-script-elements.

I'm not saying serde_json should escape those characters, but I think it should document that it doesn't do that so that users would know about the issue.

andy128k commented 4 years ago

There is nothing bad in embedding JSON into <script>. JSON is a subset of JS (since ES2019).

webmaster128 commented 4 years ago

Thanks for the answers. I continued the research a little bit to understand why this is a problem. This is described in this StackOverflow answer and the linked documents. It is not at all about JavaScript. It is about how HTML parsers detect the end of a <script>. The following code is also valid JavaScript but is documented to break the <script> block:

<SCRIPT type="text/javascript">
      document.write ("<EM>This won't work</EM>")
</SCRIPT>

So no matter how valid your JS is, if the browser does not read it completely, you might get doomed. I don't know how HTML parsers have evolved over time and how relevant the issue still is.

lambda-fairy commented 4 years ago

@webmaster128 this StackOverflow answer is more up-to-date: https://stackoverflow.com/a/14608300/617159

andy128k commented 4 years ago

@webmaster128 HTML parser doesn't care about JavaScript (And it should not).

it sees tokens differently.

<SCRIPT type="text/javascript">
      document.write ("
      <EM>
          This won't work
      </EM>
      ")
</SCRIPT>
alex35mil commented 3 years ago

I was about to crate an issue but found this one. Placing JSON in <script> tag is valid case, e.g. when you want to rehydrate client app with server state, you have to place data into DOM so client could pick it up. Since it's perf sensitive part, it makes sense to serialize JSON + escape HTML chars and JS line terminators in single pass.