Closed guyuqi closed 5 years ago
Wow, this is awesome! Can you confirm that it passes the unit tests when using_NEON == True
? Our continuous integration only tests x86.
Our environment:
Linux yq-bitsfl 4.12.0-222-arm64 #1 SMP Debian 4.12.0.linaro.222-1 (2017-08-01) aarch64 aarch64 aarch64 GNU/Linux
Gcc:
Target: aarch64-linux-gnu
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10)
The macro __ARM_NEON
is defined by aarch64 gcc and using_NEON == True
.
Then the test cases in path "bitshuffle/bitshuffle/tests" are passed.
root@yq-bitsfl:~/bitshuffle/bitshuffle/tests# python test_ext.py
........................................................
----------------------------------------------------------------------
Ran 56 tests in 1.376s
OK
root@yq-bitsfl:~/bitshuffle/bitshuffle/tests# python test_h5filter.py
...
----------------------------------------------------------------------
Ran 3 tests in 0.106s
OK
root@yq-bitsfl:~/bitshuffle/bitshuffle/tests# python test_regression.py
.
----------------------------------------------------------------------
Ran 1 test in 0.035s
OK
@kiyo-masui Are there any benchmark tools in bitshuffle? Could you please tell me how to benchmark the bishuffle when it leverage SSE and AVX instruction ? Thanks!
To benchmark, change the TIME
variable to 8 in test_ext.py and the REPEATC
variable to 32 in ext.pyx. Then rerun the setup.py and run test_ext.py. I should print timings.
@kiyo-masui, the tests are passed on Arm64 platform. Any comments for this pr will be appreciated. Thanks!
Very nice overall. Note the return codes listed in the comments of bitshuffle_core.h. -11 means sse is missing. You probably want to make a new return code (-13) for missing Neon.
Other than that, looks good!
Very nice overall. Note the return codes listed in the comments of bitshuffle_core.h. -11 means sse is missing. You probably want to make a new return code (-13) for missing Neon.
Other than that, looks good!
Thanks for comments. Updated!
NEON technology is an advanced SIMD (Single Instruction, Multiple Data) architecture for the Arm Cortex-A series processors. This patch is to make use of Neon to accelerate bitshuffle performance on Arm64 platform.
Change-Id: I97ca8e5bc0bdc26729ace7c9790c94fab8c40842 Signed-off-by: Yuqi Gu yuqi.gu@arm.com