googlearchive / caja

Caja is a tool for safely embedding third party HTML, CSS and JavaScript in your website.
Apache License 2.0
1.13k stars 127 forks source link

Caja does not strip downlevel-revealed conditional comments [possible fix included] #1920

Open kpreid opened 9 years ago

kpreid commented 9 years ago

Original issue 1922 created by morgan.anthony on 2014-06-19T14:50:25.000Z:

What revision of the cajoler exhibits the problem? On what browser and OS?

svn HEAD - all platforms

What steps will reproduce the problem?

var example = caja.makeSaxParser({ pcdata : function(x) { if (/vml/.test(x)) throw new Error("not stripping downlevel-revealed conditional") }, }); example('<![if !vml]>foo<![endif]>');

What is the expected output? What do you see instead?

It throws the exception. It shouldn't if caja is to handle (drop) these, admittedly, annoying junk tags.

Please provide any additional information below.

http://msdn.microsoft.com/en-us/library/ms537512(v=vs.85).aspx#dlrevealed

these are pretty common in emails, which is the use case I'm looking at here.

I've "fixed" this issue by changing:

    case '&lt;\!':
      if (!/^\w/.test(next)) {
        if (h.pcdata) {

to:

    case '&lt;\!':
      if (!/^(\w|\[)/.test(next)) {
        if (h.pcdata) {

In parseCPS function, to catch it along with doctype declarations and the like. But of course I'm not sure that's right, having only a cursory understanding of the code.

I tried to run the tests, but seemingly did not have enough memory on this machine!

kpreid commented 9 years ago

Comment #1 originally posted by kpreid@google.com on 2014-08-04T17:41:08.000Z:

I'm not sure what the issue here is. The stuff is designed to be parsed as junk tags by non-IE parsers, and in the particular case of Caja they'll be discarded or rewritten by the whitelist. (Also, in general, when there's doubt about how Caja should interpret HTML input, we currently prefer to use the HTML5 / WHATWG rules.)

Is there a case where Caja fails to sanitize/sandbox/render content properly due to this?

Is there a specific use case for Caja's parser you have in mind that would benefit? You say you're parsing HTML email, but are you doing something other than using the Caja sandbox for the results?

kpreid commented 9 years ago

Comment #2 originally posted by morgan.anthony on 2014-08-04T23:34:50.000Z:

The stuff is designed to be parsed as junk tags by non-IE parsers, and in the particular case of Caja they'll be discarded or rewritten by the whitelist.

I'm not sure I was clear, the tags themselves are not being discarded. The tags are parsed as plain text. That is what I was attempting to illustrate with my example. e.g. currently:

var example = caja.makeSaxParser({pcdata : function(x) { console.log(x) } }); example('<![if !vml]>foo<![endif]>');

yields the following console output:

<! [if !vml] > foo <! [endif] >

I would expect just:

foo

Is there a specific use case for Caja's parser you have in mind that would benefit? You say you're parsing HTML email, but are you doing something other than using the Caja sandbox for the results?

I am using the sax parser directly, and displaying it's output outside of the sandbox. I have html input (emails) with downlevel-revealed comments, which I wish to handle as above.

kpreid commented 9 years ago

Comment #3 originally posted by kpreid@google.com on 2014-08-04T23:59:13.000Z:

Thank you for the clarification; that's definitely a bug.