krux / postscribe

Asynchronously write javascript, even with document.write.
MIT License
986 stars 157 forks source link

Ampersands being escaped in document.write calls #346

Open d3x7r0 opened 7 years ago

d3x7r0 commented 7 years ago

This relates with issue #98 and the fix done for it.

We just came across some calls being made that got broken when we upgraded from v1 to v2.

The code we found coming from the adserver is something like this:

function(u, t) { // u = "http://example.com?foo=bar&a=1"; t = false
    var a = document.createElement('script');
    a.src = u;
    a.async = t;
    t ? document.appendChild(a) : document.write(a.outerHTML);
}

This worked fine in v1 but now it's broken because the browser returns that outerHTML escaped.

If you run the code straight into a browser or through v1.4.0 it works normally and the ending url is correct (unescaped). If, however, you pass the code through postscribe v2 you end up with "http://example.com?foo=bar&a=1" as the url for the script tag which obviously causes some issues.

dompuiu commented 7 years ago

I am interested to fix this bug, but I wanted to discuss first what would be the best option to fix it.

Looking at postscribe 1.4.0, it seems the src attribute was unescaped here: https://github.com/krux/postscribe/blob/1.4.0/htmlParser/htmlParser.js#L149

That part of code seems to live now in prescribe: https://github.com/krux/prescribe/blob/master/src/streamReaders.js#L62

One option would be to do the unescaping inside prescribe code like it was previously done in postcribe 1.4.0. Another option would be to do the unescaping inside postscribe _handleScriptToken method(https://github.com/krux/postscribe/blob/master/src/write-stream.js#L330). Thoughts?

cobbdb commented 7 years ago

This is happening for quotes as well:

var script = document.createElement('script');
script.src = '...';
script.setAttribute('data-anvp', '{"some":"data"}');
postscribe('#target', script.outerHTML);

.. becomes ..

<script src=".." data-anvp="{&quot;some&quot;:&quot;data&quot;}"></script>

which errors when parsing that attribute value.

cobbdb commented 7 years ago

Was able to work around this by filtering with the beforeWriteToken callback:

        postscribe('#p1', script.outerHTML, {
            beforeWriteToken: function (token) {
                var anvp = token.attrs['data-anvp'];
                if (anvp) {
                    anvp = anvp.replace(/(&quot\;)/g, '"');
                    token.attrs['data-anvp'] = anvp;
                }
                return token;
            }
        });

.. which writes without the html entity and JSON.parse()'s without errors.

matzeeable commented 3 years ago

I think this relates also to #506.