kiyo-masui / bitshuffle

Filter for improving compression of typed binary data.
Other
215 stars 76 forks source link

Leverage Arm64 NEON instructions on bitshuffle #71

Closed guyuqi closed 5 years ago

guyuqi commented 5 years ago

NEON technology is an advanced SIMD (Single Instruction, Multiple Data) architecture for the Arm Cortex-A series processors. This patch is to make use of Neon to accelerate bitshuffle performance on Arm64 platform.

Change-Id: I97ca8e5bc0bdc26729ace7c9790c94fab8c40842 Signed-off-by: Yuqi Gu yuqi.gu@arm.com

kiyo-masui commented 5 years ago

Wow, this is awesome! Can you confirm that it passes the unit tests when using_NEON == True? Our continuous integration only tests x86.

guyuqi commented 5 years ago

Our environment: Linux yq-bitsfl 4.12.0-222-arm64 #1 SMP Debian 4.12.0.linaro.222-1 (2017-08-01) aarch64 aarch64 aarch64 GNU/Linux

Gcc:

Target: aarch64-linux-gnu
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.10)

The macro __ARM_NEON is defined by aarch64 gcc and using_NEON == True . Then the test cases in path "bitshuffle/bitshuffle/tests" are passed.

root@yq-bitsfl:~/bitshuffle/bitshuffle/tests# python test_ext.py
........................................................
----------------------------------------------------------------------
Ran 56 tests in 1.376s

OK
root@yq-bitsfl:~/bitshuffle/bitshuffle/tests# python test_h5filter.py
...
----------------------------------------------------------------------
Ran 3 tests in 0.106s

OK
root@yq-bitsfl:~/bitshuffle/bitshuffle/tests# python test_regression.py
.
----------------------------------------------------------------------
Ran 1 test in 0.035s

OK
guyuqi commented 5 years ago

@kiyo-masui Are there any benchmark tools in bitshuffle? Could you please tell me how to benchmark the bishuffle when it leverage SSE and AVX instruction ? Thanks!

kiyo-masui commented 5 years ago

To benchmark, change the TIME variable to 8 in test_ext.py and the REPEATC variable to 32 in ext.pyx. Then rerun the setup.py and run test_ext.py. I should print timings.

guyuqi commented 5 years ago

@kiyo-masui, the tests are passed on Arm64 platform. Any comments for this pr will be appreciated. Thanks!

kiyo-masui commented 5 years ago

Very nice overall. Note the return codes listed in the comments of bitshuffle_core.h. -11 means sse is missing. You probably want to make a new return code (-13) for missing Neon.

Other than that, looks good!

guyuqi commented 5 years ago

Very nice overall. Note the return codes listed in the comments of bitshuffle_core.h. -11 means sse is missing. You probably want to make a new return code (-13) for missing Neon.

Other than that, looks good!

Thanks for comments. Updated!