Add SVG support - Githubissues

jams2 commented 2 years ago

This PR adds support for SVGs, and rasterising them to PNG with svglib. This is part of the work described in https://github.com/wagtail/rfcs/pull/77. The high level goals for this PR, described in the RFC, are:

provide classes to handle SVG files (SVGImageFile, SVGImage, SVGImageWrapper)
provide a plugin for rasterisation to PNG, using svglib
transformations (cropping and resizing) should be collected on the SVGImage instance, and applied at the time of rasterisation

To mitigate security issues related to parsing SVGs defusedxml is used for parsing SVG files - with DTD retrieval, entities, and external references disabled as recommended in that library's best practices. It may be worth considering setting an upper bound on SVG file size and limiting parse time, but this seems like the concern of calling code (e.g. Wagtail).

dopry commented 2 years ago

I was so excited to see this, I started working on SVG support in Wagtail . I can upload an SVG in the Admin and see it, but renditions are just copys and width and height are statically set to 100... If this can land, it should be a pretty quick update to get proper rendition support.

zerolab commented 2 years ago

https://blog.cloudflare.com/svg-support-in-cloudflare-images/, particularly https://github.com/cloudflare/svg-hush may be of interest (probably as a future extension, via a package that provides Python bindings for the Rust lib? 🤔)

jams2 commented 2 years ago

In https://github.com/wagtail/Willow/pull/99/commits/4a030171693eae2111ad3d10d7a57edf225a326c I made changes to perform crop and resize operations directly to an SvgImage, without rasterising.

"Cropping" an SVG means setting its viewBox attribute such that it bounds the desired graphic content. This may require scaling and translating input coordinates, which was missing in previous commits and is now handled. This is effective with a small caveat - SVG elements with attributes relative to the viewBox may be negatively affected by a change in the viewBox value. However, an SVG could still be rasterised before being cropped to get around this, and given the lack of support for an explicit crop operation with svglib/reportlab, I think this is acceptable.

Resizing can be accomplished by setting the height and width attributes - image scaling can be achieved by scaling these attributes while maintaining their aspect ratio.

There are cases in which it is not immediately obvious how to crop an SVG:

it has no viewBox defined - in this case we set the viewBox to "0 0 {width} {height}", a reasonable default as we maintain a 1:1 mapping between viewport and user space dimensions;
it has no width and no height defined - we set the width and height to the width and height values of the viewBox, meaning the viewport matches the dimensions of the viewBox; and
it has neither viewBox nor width/height.

In the final case, we take our cue from browsers and set a width of 300 and a height of 150 (then, the viewBox will be set to "0 0 300 150" to match). This seems reasonable as an SVG is highly likely destined for the web anyway, and has arguably limited utility without these attributes.

With these behaviours, we can crop and resize any input SVG by adjusting its width, height and viewBox attributes. I believe this is a better solution than collecting the operations and applying them at the time of rasterisation for the following reasons:

we are able to present an API for manipulating SVGs to applications such as Wagtail, without requiring extra plugins to be installed, and without incurring the extra computational cost of rasterisation (which I admittedly have not measured at this stage);
SVGs have a number of neat properties, such as the ability to be upscaled;
The rasterisation process in svglib (which uses reportlab under the hood) is somewhat lacking - notably, it doesn't handle transparent backgrounds and appears to not respect preserveAspectRatio (it would be worth investigating cairosvg soon)

Attached are screenshots (appropriately, rendered as SVGs) of a Python session demonstrating cropping and resizing of an SVG.

EmacsyswF3l

EmacsuoXv6t

Open questions:

when performing an operation on an instance of SvgImage, we create and return a new instance of the class, editing and passing on the underlying ElementTree. Should we instead copy the underlying tree and use that for the new instance, to preserve old instances?

jams2 commented 2 years ago

Here's a sample of SVGs rasterised to PNG using svglib - note the horizontal line artifacts.

2022-10-10_11-28

It appears to be a known issue:

I get much better results with cairosvg:

cairotiger

What are your thoughts on changing to cairosvg for the default rasterisation plugin @kaedroho ? There's a few more installation steps required (libffi-dev, python-dev, etc).

dopry commented 2 years ago

@jams2 re: cairo vs svglib, My vote would be on Cairo for better image quality. The artifacts you demo'd generated by svglib would be unacceptable to my clients.

dopry commented 2 years ago

@jams2 I've been doing some integration testing of this branch with wagtail. I've noticed a few issues...

floating point width and height raise errors in wagtail.images.image_operations.ImageTransform._check_size I have some SVGs that define their viewBox with floating values, ex)viewBox="0 0 569.1 217.9". Many vector illustration tools will write in float values. ImageTransform._check_size checks that the size is a two tuple of ints. It seems like wagtail should be update to expect floating sizes so that early rounding doesn't end up generating bad results with poorly mastered SVGs. Take the following scenario as an example of how things can go wrong.

A designer with a poorly scaled coordinate system zoomed in way to far in illustrator and ended up with a viewbox like 0 0 0.320 0.480. If we rounded this at load it would result in a 1x1 image completely losing the aspect ratio if max: 320x480 were applied. The result would be a very poorly scaled and likely aggressively skewed the image.
SVGs with embedded DOCTYPE cannot be loaded, they raise defusedxml.common.DTDForbidden, see the image below as an example of an SVG with embedded DOCTYPE. This may be an issue upstream, I'm not familiar enough with the XML library in use to know if DTD downloads can be disabled and the error suppressed.

jams2 commented 1 year ago

Hi @dopry ,

thanks for the feedback!

SVGs with embedded DOCTYPE cannot be loaded, they raise defusedxml.common.DTDForbidden, see the image below as an example of an SVG with embedded DOCTYPE. This may be an issue upstream, I'm not familiar enough with the XML library in use to know if DTD downloads can be disabled and the error suppressed.

This is by design. Allowing DTDs leaves the door open for a number of different exploits. The defusedxml readme has a good overview.

However, we met yesterday and agreed that we should follow the principle of "secure by default", with the option for developers to allow unsafe operations if desired. With that in mind, once this PR is merged we'll update the Image interface with an allow_unsafe argument (or something like that - it might need to be more granular), so you can allow DTDs etc. if you desire. I imagine this will be exposed in Wagtail via settings.

floating point width and height raise errors in wagtail.images.image_operations.ImageTransform._check_size I have some SVGs that define their viewBox with floating values, ex)viewBox="0 0 569.1 217.9". Many vector illustration tools will write in float values. ...

This will need to be handled in Wagtail. Based on your feedback I've updated this PR to remove a couple of erroneous calls to round that may be problematic in certain circumstances when transforming SVGs, so Willow should handle SVGs with non-integer sizes without issue.

Regarding rasterisation - we decided to remove the svglib plugin from this PR as there's no point in shipping something that doesn't perform well. We will add a cairo plugin once this is merged - I've got a branch going that just needs some tests and docs.

dopry commented 1 year ago

@jams2 I understand different XML attack vectors. I propose that there may be more graceful ways of handling SVGs with DTDs rather than raising an error when a DTD is present or asking users to accept that unsafe behaviors. Here are my thoughts...

DTDs are only dangerous if the underlying XML or SVG library are actually exploitable through DTD/Entity attacks. I think we should understand the policy of our upstream SVG Library and whether they allow these exploitable features in the first place.
Input Sanitizing - if our libs are exploitable, rather than rejecting DTDs we could always strip out dangerous elements such as ENTITY or just ENTITY ... SYSTEM, xi:include, and show users a warning when in 'safe' operating mode. This will minimize the user interaction necessary to correct poorly or maliciously mastered SVGs.

I'd lean toward sanitizing because graphic designers in my experience don't pay attention to the contents of their SVG output and they're often poorly mastered unless it's a really technical designer. I also don't think many of the editors I work with would open up the SVG in a text editor and fix it... If wagtail rejected a SVG I would expect the following to happen in what are likely worse case scenarios... Editor emails designer to say image isn't working, back and forth discussion, a few exports and maybe days or hours later the image is updated. Admittedly this would probably only happen a few times, but I feel it would leave a poor impression of the UX to new users. It would force them to learn about the security issues involved in SVG mastering... but we could achieve a similar education with prominent warnings and letting users know what we fixed and why... In that case I figure and editor would share the feedback with a designer out of concern and then the designer might do a better job exporting their SVGs.

jams2 commented 1 year ago

@dopry this is great feedback, thanks. I empathise with the scenario you describe and agree we should do our best to avoid a situation that discourages users from using SVGs in their Wagtail site.

In this PR I'm using xml.etree.ElementTree, which uses Expat under the hood. According to the Python docs, etree:

is not vulnerable to external entity expansion;
does not retrieve external DTDs; and
is not vulnerable to decompression bombs.

It is vulnerable to quadratic blowup and billion laughs for Expat versions < 2.4.1. (mitigated by use of defusedxml)

etree is potentially vulnerable to file expansion/retrieval by way of XML Inclusion (xi:include elements), but this requires explicit use of a helper module. We're not doing this so it's not a concern.

The best practices list in the defusedxml readme (based on this presentation) was a motivating factor for my original stance that we should forbid DTDs. However, beyond the issues listed above, I can't find concrete examples of attacks that target parsers that allow DTDs. DTDs are allowed by default in defusedxml, and the author of that library does not consider them to be a risk on their own. With these points in mind, I'm in favour of allowing DTDs, as long as we're addressing the vectors we're vulnerable to. This should result in more SVG files being usable without issue.

Sanitising SVG files is an interesting thought. I can't find an existing library that does this. Perhaps extending etree (and the underling XMLParser class) like defusedxml does would be a viable approach to creating something. I'm not sure that this should be part of Willow, however. That would need maintainers' input.

dopry commented 1 year ago

@jams2 thanks for digging into those details. I think allowing DTDs is a good starting point and agree sanitizing is probably out of scope for Willow. Sanitizing is probably a better fit for defusedxml.

dopry commented 1 year ago

@kaedroho @ababic Can you guys give this a review. It would be nice to get this merged so work on SVG support in wagtail can move forward.

dopry commented 1 year ago

@jams2 I came across this viewBox this evening which is failing to parse, viewBox="-350 360.1 464 81.9"

I'm going to test an re.split() approach in place of the VIEW_BOX_RE. I think it may be a bit more resilient.

dopry commented 1 year ago

@jams2

I was able to get the viewBox parsing working for my case with https://github.com/dopry/Willow/commit/e5725bc7cbac8132b9b88492451b963dbce5a030

I added support for % width and height in https://github.com/dopry/Willow/commit/aa9e41b0ab78ed6c6a8e74f7f6f755ef03f6e156. Affinity sets width and height to 100% by default. Since we don't know the display context I used the viewbox as a basis. I'm not sure if this is correct, but seems to be working in my test case. I need to read deeper into the spec. It maybe that just using the viewport width and height as the intrinsic width and height would be better in most cases and treat width and height as properties that belong to the presentation device instead of being intrinsic to the image. I need to read up more on the implications of https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/height, in particular Note: Starting with SVG2, height is a Geometry Property meaning this attribute can also be used as a CSS property for

I tried cleaning that SVG up by importing into affinity, and the output sets width and height to 100%, raising the Unable to handle relative size units error. I was able to update the code to support those scenarios...

I've noticed that the SVG with viewBox="-350 360.1 464 81.9" is not resizing properly. The origin is being set to 0, but the coordinates are not being adjust so nothing is visible in the viewbox after resizing. Affinity resets the origin, but adds a transform to the root group. I'll try to dig into this a bit more to figure out what is going on.

jams2 commented 1 year ago

Thanks @dopry, sorry for the slow response, I've been offline. I've opted to stick with the regular expression method for parsing viewBoxes for now as I was able to simplify the pattern to a point that I think it's easy enough to read/maintain. I will look into the resizing issue tomorrow.

jams2 commented 1 year ago

@dopry I wasn't able to reproduce the resize issue when interacting with your SVG directly with Willow, but I did unearth an issue in the cropping code that I suspect is the cause. If you noticed the issue while using Wagtail's image transformations it could be that one of them is calling SvgImage.crop. The issue was not handling non-zero viewBox minX/minY when cropping SVGs, and is fixed.

Regarding allowing relative width/height - I agree that if the default output of common SVG editing software includes dimensions with percentage units, we should be able to handle it. My understanding of relative width/height attrs is that the viewport size will be calculated relative to the parent container (in the context of HTML/CSS).

Perhaps there is an argument in favour of allowing SVGs with percentage width/height attrs so long as a viewBox is defined, and disregarding the height/width attrs - initially treating the SVG as an SVG that only has a viewBox defined, and treating the viewBox width/height as the intrinsic size of the image. This might make the most sense, as:

width and height are overwritten in the crop and resize functions; and
trying to account for the different ways an SVG with relative height/width attrs is rendered in CSS depending on the height/width attrs of the parent element seems futile.

dopry commented 1 year ago

I did encounter the issue through a wagtail transform, so I think that crop issue you found was likely this issue. i'll try to update with your current code early next week to verify.

I am currently using a variation on my branch and using viewbox as scaling by the the width and height%, but I think, for the same reasons you enumerate that I should be using the unscaled viewBox dimensions as the intrinsic size of the image.

jams2 commented 1 year ago

That should cover it - the SVG module will now handle SVGs with relative width and/or height attributes, essentially treating them as if they were not defined. I think this is the only reasonable way to handle them, as the concept of a size relative to a parent element is relevant to a rendering context only, and in Willow there is no concept of a rendering context/surface as such.

It would arguably be reasonable to ignore height/width attributes entirely, using only the viewBox as an SVG's intrinsic size. However, it's conceivable that an illustrator may use the attributes purposefully, or without including an explicit viewBox.

I've tested this with my Wagtail branch with SVGs with many combinations of height/width/viewBox attrs:

no attrs
both relative
both absolute
one relative, one absolute
all of the above, in combination with viewBox/no viewBox
viewBox with negative origins
viewBox with 0 origins
viewBox positive offset origins

and it seems to be working well.

dopry commented 1 year ago

@jams2 I can't tell you how much your work is appreciated on this. I don't look forward to unmonkey patching what I did to get this running on my project, but I'm super excited to see it make it way upstream!

gasman commented 1 year ago

Currently in the middle of reviewing this, and it would be really good to have some documentation at this point - both for the benefit of end-users, and to define the scope of the changes in this PR.

Even if the aim is to get the SVG backend fully in parity with the existing ones, I think we need at minimum:

svglib added to the list of supported libraries on the front page
the "Pillow / Wand only" lines on the operation reference list updated where necessary
documentation for the save_as_png method

dopry commented 1 year ago

@gasman,

svglib isn't used, so shouldn't need to be added to a list of supported libraries. svg operations are done locally in willow/svg.py without dependency on external libraries.
there are no svglib restricted operations since svglib is not used, so there are no svglib only operations to document unless I'm misunderstanding your request.
this pr doesn't touch the save_as_png operation, did you mean save_as_svg?

gasman commented 1 year ago

@dopry Ah yep! I was reviewing the PR commit-by-commit, and hadn't yet reached the one where svglib was removed... In that case, the front page of the docs should state that SVG is supported with no additional libraries, and the "Pillow / Wand only" lines should probably be removed (since they do now work, albeit in a limited capacity, without those libraries installed). I meant save_as_svg, yep.

jams2 commented 1 year ago

@gasman apologies for any confusion, there was quite a lot of discussion and discovery along the way with this one - perhaps I should have updated the original PR description to reflect. I will get some documentation together :+1:

gasman commented 1 year ago

The code here makes sense and all works well in my testing - happy to merge this once the documentation updates are in place.

wagtail / Willow

Add SVG support #99