scour-project / scour

Scour - An SVG Optimizer / Cleaner
Apache License 2.0
765 stars 60 forks source link

Sanitize potential security vulnerabilities from SVG files #254

Open ottok opened 4 years ago

ottok commented 4 years ago

Here is a pretty good article on the topic: https://kinsta.com/blog/wordpress-svg/#svg-security. It links for more information to https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing, https://www.owasp.org/images/0/03/Mario_Heiderich_OWASP_Sweden_The_image_that_called_me.pdf

There is a PHP implementation at https://github.com/darylldoyle/svg-sanitizer.

Maybe Scour could include similar sanitizations?

oberstet commented 4 years ago

see my comment https://github.com/scour-project/scour/issues/29#issuecomment-640100963

oberstet commented 4 years ago

Scour was I guess created primarily to minimize and sanitize own authored SVGs, not security-scrutinize foreign SVGs.

So when I create an SVG myself, I will be aware of included (friendly) JS.

For browser loading SVGs from anywhere, this is different (and generally should be the place IMO to address this).

It is also different for HTML authors including SVGs not authored by themself.

Ideally, the HTML tags to include SVGs should have an option to turn off all the JS crap potentially embedded.

An sanitizer external to a browser (a HTML rendering engine) would not help, as the sanitizer cannot change the SVG sourced if it is not authored/hosted by the HTML author.

So what exactly is the attack scenario when I have authored/hosted the SVG myself? And if I haven't authored/hosted the SVG myself, but only source it from my HTML, then how does a sanitizer help?


that being said, quick look at the PDF linked.

  1. "Scripting and Events"
  2. "Inclusion of arbitrary objects"
  3. "Links"

bad, sure.

so let's say, I download an SVG from the net with the goal of hosting it myself, and including it then in my HTML. in that scenario, I probably not only want to minimize the SVG before uploading, but also remove any of the crap above.

  1. "In-line SVG “self-terminates” open HTML elements"

sound potentially bad. we can't do anything about that, as it is the browser HTML parser that's in control here ..

oberstet commented 4 years ago

ok, so there are indeed test cases in https://github.com/darylldoyle/svg-sanitizer/tree/master/tests/data

for embedded JS:

oberstet@intel-nuci7:~/scm/3rdparty/svg-sanitizer/tests/data$ grep "script" *.svg
badXmlTestOne.svg:<script>alert(1);</script>
hrefTestOne.svg:    <a href="javascript:alert(2)">test 1</a>
hrefTestOne.svg:    <a xlink:href="javascript:alert(2)">test 2</a>
hrefTestOne.svg:    <a href="javascript&#9;:alert(document.domain)">test 7</a>
hrefTestTwo.svg:    <a href="javascript:alert(2)">test 1</a>
hrefTestTwo.svg:    <a xlink:href="javascript:alert(2)">test 2</a>
hrefTestTwo.svg:    <a href="javascript&#9;:alert(document.domain)">test 7</a>
svgTestOne.svg:<script>alert(1);</script>

external content:

oberstet@intel-nuci7:~/scm/3rdparty/svg-sanitizer/tests/data$ grep 'xlink:href="http' *.svg
useDosCleanTwo.svg:        <image xlink:href="https://image.shutterstock.com/image-photo/beautiful-water-drop-on-dandelion-260nw-789676552.jpg"></image>
useDosCleanTwo.svg:        <image xlink:href="https://image.shutterstock.com/image-photo/beautiful-water-drop-on-dandelion-260nw-789676552.jpg"></image>
useDosTestTwo.svg:<image xlink:href="https://image.shutterstock.com/image-photo/beautiful-water-drop-on-dandelion-260nw-789676552.jpg"/>
useDosTestTwo.svg:<image xlink:href="https://image.shutterstock.com/image-photo/beautiful-water-drop-on-dandelion-260nw-789676552.jpg"/>
oberstet@intel-nuci7:~/scm/3rdparty/svg-sanitizer/tests/data$ 

other


oberstet@intel-nuci7:~/scm/3rdparty/svg-sanitizer/tests/data$ grep 'foreignObject' *.svg
oberstet@intel-nuci7:~/scm/3rdparty/svg-sanitizer/tests/data$ grep 'iframe' *.svg
oberstet@intel-nuci7:~/scm/3rdparty/svg-sanitizer/tests/data$ grep 'onload' *.svg
badXmlTestOne.svg:<line onload="alert(2)" fill="none" stroke="#000000" stroke-miterlimit="10" x1="119" y1="84.5" x2="454" y2="84.5"/>
hrefTestOne.svg:    <a href="data:data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' onload='alert(88)'%3E%3C/svg%3E">test 5</a>
hrefTestOne.svg:    <a xlink:href="data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' onload='alert(88)'%3E%3C/svg%3E">test 6</a>
hrefTestTwo.svg:    <a href="data:data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' onload='alert(88)'%3E%3C/svg%3E">test 5</a>
hrefTestTwo.svg:    <a xlink:href="data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' onload='alert(88)'%3E%3C/svg%3E">test 6</a>
svgTestOne.svg:<line onload="alert(2)" fill="none" stroke="#000000" stroke-miterlimit="10" x1="119" y1="84.5" x2="454" y2="84.5"/>
`` 
Ede123 commented 4 years ago

If security is a concern, embedding SVGs in an <img> tag is the safest approach, as it will prevent the SVG from loading any JavaScript or external resources.
Are you aware of any security vulnerabilities that are not caught by this approach @ottok?

Within the scope of Scour I'd say support for stripping JavaScript and similar could be considered as a way of minimizing even further at the user's request, but it should be clear Scour is not a tool to detect and remove security vulnerabilities (and people should never rely on Scour to do so for security-critical applications). That said I expect that in most of the cases there will only ever be JavaScript in SVG files when the user put it there on purpose (and therefore probably does not want to remove it), so it also won't be an overly useful option.

oberstet commented 4 years ago

embedding SVGs in an tag is the safest approach, as it will prevent the SVG from loading any JavaScript or external resources

nice! we should probably add this as a BIG NOTE / WARNING to the readme ..

does is also prevent any embedded JS (not externally referenced/loaded) from running?

does that prevent any loading of externally referenced content (eg non-JS) as well? because of privacy .. eg single-pixel images embedded just to track users. as HTML mails often do.

as a user I just want SVG to behave like a self-contained image.. no JS, no externally referenced content. I just want a vector image.

ottok commented 4 years ago

Personally for me the use case would be to add Scour to my toolset (among optipng and jpegoptim) to run on all images I either produce myself, or to run automatically on images users upload on a website (e.g. their profile image).

Having the ability to sanitize SVG in Scour would be cool so I don't need to have a separate tool for that.

Now when all modern browsers support SVG, the only thing preventing SVG from being more common is the tooling to automatically strip out injected contents from them. This is for example the reason why WordPress does not allow SVG images to be used on WordPress sites (and thus users need to specifically install https://wordpress.org/plugins/svg-support/ and whatnot to get SVG support in WordPress). Personally I could use Scour in a WordPress pipeline to enable SVG on WordPress, is Scour had similar features as the above linked svg-sanitizer.

oberstet commented 4 years ago

run automatically on images users upload on a website

ok, yeah, got you. that use case makes sense for scour as well - and then we could have new options like:

--remove-embedded-js
--remove-embedded-objects
--remove-external-js
--remove-external-images
--remove-external-objects
ottok commented 4 years ago

The options above sound good to me!

nthykier commented 4 years ago

Note that scour uses minidom to parse XML and Python has published the following warning about using minidom on "untrusted input":

Warning

The xml.dom.minidom module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities.

Source: https://docs.python.org/3/library/xml.dom.minidom.html XML Vulnerabilities: https://docs.python.org/3/library/xml.html#xml-vulnerabilities

IOW, we would have to replace/rewrite the XML parser first before it is safe to use it on untrusted input. AFAICT, defusedxml is the only parser that is "safe to use" (at least out of the box).