simd-everywhere / simde

Implementations of SIMD instruction sets for systems which don't natively support them.
https://simd-everywhere.github.io/blog/
MIT License
2.28k stars 237 forks source link

arm: fix some neon2rvv intrinsic function error #1173

Closed zengdage closed 3 months ago

zengdage commented 3 months ago
  1. For vqdmlal_s16/s32, the doubling result maybe overflow, so need to use vqaddq_s16/32 to saturate it. As the same with vqdmlsl_s16/32.

  2. The vrdmulh family function need to use vqadd saturating function to avoid the doubling result overflow. But now I just use rvv intrinsic function to replace the RISCV_V_NATIVE implementation.

  3. The result of vrshl family function need to keep the sign bit of the origin data. Ifa > 0 && b < 0, the result of (a + (1 << (-b - 1))) maybe overflow into a negative value. And in gcc/clang, >> means the arithmetic shift left, so it will get the incorrect sign bit whithout unsigned extend value.

howjmay commented 3 months ago

@zengdage would you mind helping me to fix the same errors in neon2rvv too? Or I can do it?

zengdage commented 3 months ago

@zengdage would you mind helping me to fix the same errors in neon2rvv too? Or I can do it?

Hi @howjmay, the reason for these errors is because the effect of saturation values was not taken into account, so the corner cases can't passed. I suggest you to run the arm-neon-tests (https://github.com/christophe-lyon/arm-neon-tests.git) by your neon2rvv repo. Last year, I used it to find some errors in my own neon2rvv implementation.

The usage of arm-neon-tests as follows:

1. add your own neon2rvv.h in stm-arm-neon.h;
2. modify the Makefile;
3. make -f Makefile
4. you will get the incorrect information in expected_input4gcc.txt.

Hope the above information can help you.

howjmay commented 3 months ago

That helps a lot thanks!

mr-c commented 3 months ago

Thank you @zengdage !