Closed kif closed 3 years ago
Here is a simple demo program:
#!/usr/bin/python3
import numpy
import h5py
import sys
import time
with h5py.File(sys.argv[1], "r") as h:
t0 = time.time()
for i, f in enumerate(h["entry_0000/measurement/data"]):
npa = f[()]
t1 = time.time()
print(f"Time to read {i+1} frames of size {npa.shape}: {t1-t0:.4f}s. {(i+1)/(t1-t0):.2f} fps")
And here are the results:
~/workspace/bitshuffle$ OMP_NUM_THREADS=1 HDF5_PLUGIN_PATH=./build/lib.linux-ppc64le-3.8/bitshuffle/plugin ./read_speed.py eiger_0000.h5
Time to read 1100 frames of size (2162, 2068): 21.2342s. 51.80 fps
~/workspace/bitshuffle$ OMP_NUM_THREADS=1 HDF5_PLUGIN_PATH=./build/lib.linux-ppc64le-3.8-ref/bitshuffle/plugin ./read_speed.py eiger_0000.h5
Time to read 1100 frames of size (2162, 2068): 34.5846s. 31.81 fps
~/workspace/bitshuffle$ OMP_NUM_THREADS=1 HDF5_PLUGIN_PATH=./build/lib.linux-ppc64le-3.8/bitshuffle/plugin ./read_speed.py eiger_0000.h5
Time to read 1100 frames of size (2162, 2068): 21.2336s. 51.80 fps
The activation of "SSE2" code on an IBM power9 provides a gain of about 30% in speed.
Hey @kif, thanks for these changes! Are you able to update your fork? I tried myself but I didn't have the permissions. Then I can merge this PR. Thanks.
Merged in #102.
On PowerPC, gcc and clang offer an automatic translation of the SSE2 code to Altivec.
For arm32 and arm64, both
mcpu
andmarch
options are available. On intel x86 computers,mcpu
does not exist. On PowerPC,march
option does not exist. For other architectures (mips, ...), mcpu is more likely to be present.