AVX Optimization for Sw Engine

hermet commented 3 years ago

Still there are a lot of rasterzing parts, not have SIMD operation. We need a better fine-tuned software raster engine.

We can consider to apply AVX to some rasterizing methods in tvgSwRaster.cpp

such as:

_translucentRect() _translucentRectAlphaMask() _translucentRectInvAlphaMask() _rasterTranslucentRect() ... _rasterRadialGradientRle()

Some might not fit to SIMD (this case we can skip to apply) but Some would be nicer than basic.

Submitting individual patches for each methods will be better.

hermet commented 3 years ago

Some references: https://www.codeproject.com/Articles/874396/Crunching-Numbers-with-AVX-and-AVX https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX,AVX2&expand=4950 https://github.com/usamadar/SIMD/blob/master/AVXTest/avxtest.cpp

mgrudzinska commented 2 years ago

we encountered a problem related to compiling the ThorVG lib with the -mavx flag and running it on the host, that does not support avx.

it can be reproduced on a pc using any program with a linked lib. The lib has to be compiled with -mavx (the lib itself doesn't need to call any avx intrinsics). I built the shared lib as follows: g++ -mavx -c -o lib.o lib.cpp gcc -shared -o liblib.so lib.o the program is compiled without -mavx flag: g++ main.cpp -o main -Ldirectory_to_the_lib -llib (LD_LIBRARY_PATH has to be updated) Since we need to test this on an architecture, that doesn't support AVX, the Intel Software Development Emulator can be used: https://software.intel.com/content/www/us/en/develop/articles/intel-software-development-emulator.html after the installation we can run the main (we pretend to run it on Nehalem microarch): sde -nhm -- ./main and it will crash. the problem is, that the assembly instructions with the vex prefixes will be used, which are not valid on the Nehalem microarchitecture.

So, the conclusion is: we cannot compile ThorVG with -mavx flag. Only the avx module can be compiled like that. During the runtime we have to decide, whether the host supports avx or not, so whether the avx module can or cannot be used.

mgrudzinska commented 2 years ago

just as an info for now: ALPHA_BLEND for neon is not so precise as for avx and c versions. we use vshrn_n_u16(t, 8); to divide the result by 256, but we should divide by 255.

vtorri commented 1 year ago

note that you have to detect at runtime which asm instruction sets can be used, not at configure time

hermet commented 1 year ago

@vtorri Yes, makes sense. thorvg needs to change it.

mgrudzinska commented 1 year ago

@hermet @vtorri you're right and we had to deal with this in tizen, that's why https://github.com/thorvg/thorvg/pull/835 was created. but it was very long time ago and I don;t remember why we decided no to merge/review it.

hermet commented 1 year ago

@mgrudzinska Just ignored the review you updated since it's not on the higher priority.

hermet commented 5 months ago

https://github.com/thorvg/thorvg/wiki/24'-Development-Roadmap

hermet commented 2 months ago

https://github.com/thorvg/thorvg/pull/835 https://github.com/thorvg/thorvg/pull/849

thorvg / thorvg

AVX Optimization for Sw Engine #29