It looks like the speedup multipliers for par_simd, ispc and ispc+tasks are incorrect given the execution time.
par_simd should be 7.37x
ispc should be 2.55x
ispc+tasks should be 5.09x.
And par_simd is 1.45x faster than ispc+tasks.
Also, have you since repeated these benchmarks with newer versions of the ispc compiler? Currently the setup_benchmarks.sh is pointing to 1.9.2 on LLVM 5 which was released almost 7 years ago. We are now at 1.23.0 on LLVM 16. Performance is quite a bit better as we've made improvements. These days ispc is often beating intrinsics implementations.
I'd be curious to see the results with a new compiler binary.
It looks like the speedup multipliers for
par_simd
,ispc
andispc+tasks
are incorrect given the execution time.par_simd
should be 7.37xispc
should be 2.55xispc+tasks
should be 5.09x.And
par_simd
is 1.45x faster thanispc+tasks.
Also, have you since repeated these benchmarks with newer versions of the
ispc
compiler? Currently thesetup_benchmarks.sh
is pointing to 1.9.2 on LLVM 5 which was released almost 7 years ago. We are now at 1.23.0 on LLVM 16. Performance is quite a bit better as we've made improvements. These days ispc is often beating intrinsics implementations.I'd be curious to see the results with a new compiler binary.