oalders / html-restrict

HTML::Restrict - Strip away unwanted HTML tags
Other
10 stars 9 forks source link

The result of tags removal can be an insecure HTML #29

Closed jurajmajor closed 5 years ago

jurajmajor commented 5 years ago

Define

my $t = qq(<<input>div onmouseover="alert(1);">hover over me<<input>/div>);

The content of $t is safe (even though not well formed) HTML. However, after using

my $r = new HTML::Restrict;
say $r->process($t);

I get the string <div onmouseover="alert(1);">hover over me</div> which is definitely not what I would expect as onmouseover attribute is not whitelisted.

Good solution for this can be calling the removal procedure in a loop until the string doesn't change; if this is not possible for some reason, docs should recommend passing well-formed HTML to HTML::Restrict (using HTML::TreeBuilder could help).

oalders commented 5 years ago

Good catch. Looping over the string sounds like the simplest, safest (if slower) thing to do. Forcing someone to use only well-formed HTML seems fairly burdensome.

I guess in the meantime you could call process() in a loop to get the same outcome, but this really should be fixed internally, with possibly the option to disable the looping behaviour in the case where you're sure that you already have well formed HTML.