michalmuskala / jason

A blazing fast JSON parser and generator in pure Elixir.
Other
1.58k stars 168 forks source link

html_safe encoding bug #151

Closed hyperknot closed 1 year ago

hyperknot commented 2 years ago

The html_safe encoding doesn't always work. For example:

<script> -> \u003Cscript> </script> -> \u003C\/script>

The browser parses this as: \x3Cscript> or \x3C/script>

The reference implementation for me is Python's Jinja2's htmlsafe_json_dumps: https://github.com/pallets/jinja/blob/4bbb1fb5fe5ec141d302c5baff95165887fb7338/src/jinja2/utils.py#L626

    return markupsafe.Markup(
        dumps(obj, **kwargs)
        .replace("<", "\\u003c")
        .replace(">", "\\u003e")
        .replace("&", "\\u0026")
        .replace("'", "\\u0027")
    )

The Python implementation encodes: <script> -> \u003cscript\u003e </script> -> \u003c/script\u003e

The browser correctly parses these.

It might be as simple as lower vs. uppercase C, but the implementation looks quite complex so I couldn't figure out the bug. I like the simplicity of the Python implementation, it's just 4 string replacement.

michalmuskala commented 1 year ago

I'm sorry for a late reply.

In general, in JavaScript syntax "\u003C" and "\x3C" represent the exact same string (this is not true of JSON where only the \u forms are supported).

Screenshot 2022-09-12 at 21 40 37

I can't really reproduce the issue. If you have a way for me to reproduce, feel free to reopen.