scour-project / scour

Scour - An SVG Optimizer / Cleaner
Apache License 2.0
755 stars 61 forks source link

Scouring an already scoured SVG can sometimes produce a smaller SVG #124

Open HatScripts opened 7 years ago

HatScripts commented 7 years ago

I ran the following Scour 0.35 command with image.svg (2829 bytes) as the input file: scour -i image.svg -o image-scoured.svg --enable-viewboxing --enable-id-stripping --enable-comment-stripping --shorten-ids --indent=none

Which produced image-scoured.svg (1287 bytes). Interested to see the result of scouring an already scoured SVG, I ran the same command again, using the output from the previous command as the input. scour -i image-scoured.svg -o image-scoured-scoured.svg --enable-viewboxing --enable-id-stripping --enable-comment-stripping --shorten-ids --indent=none Surprisingly, scouring the already scoured file using the same settings produces an even smaller file; 1280 bytes. Further scouring does not reduce this file size.

One would think that scouring an SVG multiple times should not produce a smaller output than scouring it once.

Testing 84 different SVG files as input, 8 of them were affected by this bug. Below are the differences between the output files. If necessary I can upload the input files.

3 of them had this same difference: <g fill="none" font-family="sans-serif" font-size="12" stroke-dasharray="" stroke-miterlimit="10"> <g font-family="sans-serif" font-size="12" stroke-dasharray="" stroke-miterlimit="10"> Diff: -fill="none" (-12 bytes)


<g transform="matrix(1 0 0 1 -.00047728 0)"> <g transform="translate(-.00047728)"> Diff: matrix(1 0 0 1 -.00047728 0) -> translate(-.00047728) (-7 bytes)


<path d="m23.98 22.62c-2.6842 0-4.8804 2.1962-4.8804 4.8804s2.1962 4.8804 4.8804 4.8804c2.6842 0 4.8804-2.1962 4.8804-4.8804s-2.1962-4.8804-4.8804-4.8804zm0 6.9139c-1.1388 0-2.0335-0.89474-2.0335-2.0335s0.89474-2.0335 2.0335-2.0335 2.0335 0.89474 2.0335 2.0335-0.89474 2.0335-2.0335 2.0335z"/> <path d="m23.98 22.62c-2.6842 0-4.8804 2.1962-4.8804 4.8804s2.1962 4.8804 4.8804 4.8804 4.8804-2.1962 4.8804-4.8804-2.1962-4.8804-4.8804-4.8804zm0 6.9139c-1.1388 0-2.0335-0.89474-2.0335-2.0335s0.89474-2.0335 2.0335-2.0335 2.0335 0.89474 2.0335 2.0335-0.89474 2.0335-2.0335 2.0335z"/> Diff: 4.8804c2.6842 0 4.8804-2.1962 4.8804-4.8804s-2.1962-4.8804-4.8804-4.8804zm0 -> 4.8804 4.8804-2.1962 4.8804-4.8804-2.1962-4.8804-4.8804-4.8804zm0 (-10 bytes)


<path d="m28 7c-2.209 0-4 1.79-4 4v25c-1.657 0-3 1.344-3 3s1.343 3 3 3c1.305 0 2.403-0.838 2.816-2h1.184 20v-33h-20z" fill="#42A5F5"/> <path d="m28 7c-2.209 0-4 1.79-4 4v25c-1.657 0-3 1.344-3 3s1.343 3 3 3c1.305 0 2.403-0.838 2.816-2h21.184v-33h-20z" fill="#42A5F5"/> Diff: 2h1.184 20v-33h-20z -> 2h21.184v-33h-20z (-2 bytes)


<path d="m23.98 22.62c-2.6842 0-4.8804 2.1962-4.8804 4.8804s2.1962 4.8804 4.8804 4.88 04c2.6842 0 4.8804-2.1962 4.8804-4.8804s-2.1962-4.8804-4.8804-4.8804zm0 6.9139c-1.1388 0-2.0335-0.89474-2.0335-2.0335s0.89474-2.0335 2.0335-2.0335 2.0335 0.89474 2.0335 2.0335-0.89474 2.0335-2.0335 2.0335z" fill="#455a64"/> <path d="m23.98 22.62c-2.6842 0-4.8804 2.1962-4.8804 4.8804s2.1962 4.8804 4.8804 4.8804 4.8804-2.1962 4.8804-4.8804-2.1962-4.8804-4.8804-4.8804zm0 6.9139c-1.1388 0-2.0335-0.89474-2.0335-2.0335s0.89474-2.0335 2.0335-2.0335 2.0335 0.89474 2.0335 2.0335-0.89474 2.0335-2.0335 2.0335z" fill="#455a64"/> Diffs: 4.88 04c2.6842 0 -> 4.8804 4.8804-4.8804s-2.1962-4.8804-4.8804-4.8804zm0 -> 4.8804-4.8804-2.1962-4.8804-4.8804-4.8804zm0 (-10 bytes)


<path d="m20 7c-1.105 0-2 0.895-2 2v3h-13v29h13 12 13v-32c0-1.105-0.895-2-2-2h-8c-1.105 0-2 0.895-2 2v3h-1v-3c0-1.105-0.895-2-2-2h-8z" fill="#8bc7f8"/> <path d="m20 7c-1.105 0-2 0.895-2 2v3h-13v29h38v-32c0-1.105-0.895-2-2-2h-8c-1.105 0-2 0.895-2 2v3h-1v-3c0-1.105-0.895-2-2-2h-8z" fill="#8bc7f8"/> Diff: 2v3h-13v29h13 12 13v-32c0-1.105-0.895-2-2-2h-8c-1.105 -> 2v3h-13v29h38v-32c0-1.105-0.895-2-2-2h-8c-1.105 (-6 bytes)

Ede123 commented 7 years ago

This is obviously not optimal.

I can think of two options:

  1. Use a kind of "brute-force" approach and add an option to run Scour multiple times (until the file size does not change anymore).
  2. Figure out if we can improve the optimization algorithms to always result in the smallest size possible.

While 1. is not a very clean solution I'm afraid 2. might requite a considerable amount of effort and since we're talking about only a few bytes that can be saved the time might be spent more effectively.

Certainly this needs further investigation to check if there's a straightforward fix for one or more of the individual glitches.

nthykier commented 6 years ago

Hi @HatScripts

The test case in your gist only triggers the transform problem, do you have some examples that trigger issues in the d attribute of the path tag?

nthykier commented 6 years ago

For the matrix -> transform case, it appears to happen due to a precision reduction. AFAICT, the original SVG has:

<g
     id="g3787"
     transform="matrix(1.0000227,0,0,1,-4.7728357e-4,0)>"

This is reduced to:

<g id="g3787" transform="matrix(1 0 0 1 -.00047728 0)">

(Note the first parameter in matrix is reduced to a plain 1). Once the matrix has pure 0 and 1 integer parameters, then it can apply the rewrite from optimizeTransform (which happens in the second run).

If the first precision rewrite is acceptable, then we should be able to just apply the same level of "fuzziness" in optimizeTransform before testing for the known patterns.