Enable compilation/optimisation on powerpc

kif commented 3 years ago

On PowerPC, gcc and clang offer an automatic translation of the SSE2 code to Altivec.

For arm32 and arm64, both mcpu and march options are available. On intel x86 computers, mcpu does not exist. On PowerPC, march option does not exist. For other architectures (mips, ...), mcpu is more likely to be present.

kif commented 3 years ago

Here is a simple demo program:

#!/usr/bin/python3
import numpy
import h5py
import sys
import time

with h5py.File(sys.argv[1], "r") as h:
    t0 = time.time()
    for i, f in enumerate(h["entry_0000/measurement/data"]):
       npa = f[()]
    t1 = time.time()
print(f"Time to read {i+1} frames of size {npa.shape}: {t1-t0:.4f}s. {(i+1)/(t1-t0):.2f} fps")

kif commented 3 years ago

And here are the results:

~/workspace/bitshuffle$ OMP_NUM_THREADS=1 HDF5_PLUGIN_PATH=./build/lib.linux-ppc64le-3.8/bitshuffle/plugin ./read_speed.py eiger_0000.h5 
Time to read 1100 frames of size (2162, 2068): 21.2342s. 51.80 fps
~/workspace/bitshuffle$ OMP_NUM_THREADS=1 HDF5_PLUGIN_PATH=./build/lib.linux-ppc64le-3.8-ref/bitshuffle/plugin ./read_speed.py eiger_0000.h5 
Time to read 1100 frames of size (2162, 2068): 34.5846s. 31.81 fps
~/workspace/bitshuffle$ OMP_NUM_THREADS=1 HDF5_PLUGIN_PATH=./build/lib.linux-ppc64le-3.8/bitshuffle/plugin ./read_speed.py eiger_0000.h5 
Time to read 1100 frames of size (2162, 2068): 21.2336s. 51.80 fps

The activation of "SSE2" code on an IBM power9 provides a gain of about 30% in speed.

james-s-willis commented 3 years ago

Hey @kif, thanks for these changes! Are you able to update your fork? I tried myself but I didn't have the permissions. Then I can merge this PR. Thanks.

james-s-willis commented 3 years ago

Merged in #102.

kiyo-masui / bitshuffle

Enable compilation/optimisation on powerpc #87