Open jry2 opened 9 years ago
Nice! Have you checked on higher level how much time is spent in flattenPath, qsort, and rasterize sorted edges? I expect the rasterization to dominate, but just curious. Also, what is the proportion of nsvgscanlineSolid of nsvgrasterizeSortedEdges?
Yes, see attached screenshots from release (upstream) build. Rendering 9000x9000px.
nsvg__unpremultiplyAlpha()
is another SSE2 candidate.
Same options but rendering to 900x900 target.
You should do it in NEON! What does your patch look like?
I'm working on x86/x64 project for Windows so ARM-NEON would not help. I will publish my patch.
Another benchmark (Ghostscript_Tiger.svg rendered 9000x9000px), tested x86 vs x64 performance.
Upstream version x86: 4120ms, x64: 2960ms
SSE2 version x86: 3100ms, x64: 2270ms
Edit: there is something fishy with x86 / x64 builds. Difference is in nsvg__fillActiveEdges
: 861ms for x86 build vs 70ms for x64 build. Binary output is different too.
x86
x64
Edit2: OK, nothing fishy, just another example of SSE optimization. It turned out the x64 version nsvg__fillScanline
is optimized with SSE instructions while x86 version is not. I have SSE optimization enabled on app level in compiler. Difference is mentioned ~800ms.
Different output from x86 / x64 builds could be related to http://stackoverflow.com/questions/22710272/difference-in-floating-point-arithmetics-between-x86-and-x64. There are only small differences, in most cases just about one. I didn't investigate this one.
this seems like a nice optimization ever consider creating a pull request to get this merged?
you should do it with intel ispc
Just put it here. Need to check for speed.
And need to disable (comment) calling to nsvg__unpremultiplyAlpha
inside nsvgRasterize
Benchmark on my PC:
Rendering Ghostscript_Tiger.svg
measuring nsvgRasterize
time.
Upstream NanoSVG 900x900 - ~57ms 9000x9000 - ~3210ms
Upstream NanoSVG ("Defringe" disabled) 900x900 - ~54ms 9000x9000 - ~2910ms
SSE2 Optimized NanoSVG 900x900 - ~32ms 9000x9000 - ~1130ms
I tried SSE2 version of
nsvg__scanlineSolid()
withNSVG_PAINT_COLOR
code path converted. Benchmark on my i5 661 @ 3.5GHz, Windows 7 x64, Visual Studio 2015 RC, x86 release target. Rendering Ghostscript_Tiger.svg, measuringnsvgRasterize()
time.Upstream NanoSVG 900x900px: 68ms 9000x9000px: 4256ms
SSE2 NanoSVG 900x900px: 60ms 9000x9000px: 3125ms
Broken nsvgscanlineSolid NanoSVG 900x900px: 44ms 9000x9000px: 1895ms Note: this version does nothing in `nsvgscanlineSolid()`, just return. Output is just an empty rectangle.
Some improvement, but nothing stellar. I didn't use SSE before so maybe someone experienced could do better. Anyone interested in my quick&dirty patch? Output PNG is binary same for both upstream and SSE2 versions.
Streaming SIMD Extensions (/arch:SSE) option was enabled for whole application. There is another boost with Streaming SIMD Extensions 2 (/arch:SSE2) enabled, but there are still (AMD) CPUs not supporting SSE2 in old computers.