html_safe encoding bug - Githubissues

The html_safe encoding doesn't always work. For example:

<script> -> \u003Cscript> </script> -> \u003C\/script>

The browser parses this as: \x3Cscript> or \x3C/script>

The reference implementation for me is Python's Jinja2's htmlsafe_json_dumps: https://github.com/pallets/jinja/blob/4bbb1fb5fe5ec141d302c5baff95165887fb7338/src/jinja2/utils.py#L626

    return markupsafe.Markup(
        dumps(obj, **kwargs)
        .replace("<", "\\u003c")
        .replace(">", "\\u003e")
        .replace("&", "\\u0026")
        .replace("'", "\\u0027")
    )

The Python implementation encodes: <script> -> \u003cscript\u003e </script> -> \u003c/script\u003e

The browser correctly parses these.

It might be as simple as lower vs. uppercase C, but the implementation looks quite complex so I couldn't figure out the bug. I like the simplicity of the Python implementation, it's just 4 string replacement.

michalmuskala / jason

html_safe encoding bug #151