Open mr-c opened 1 year ago
I want to do this task to get some understanding of simd. I am starting by adding AVX512F. Could you tell me what kind of measurement method can I use here.
I want to do this task to get some understanding of simd.
Great, go for it!
I am starting by adding AVX512F. Could you tell me what kind of measurement method can I use here.
I don't have specific advice for performance measurement or profiling, sorry.
https://doi.org/10.48550/arXiv.2112.06342
Be careful to only use the paper as your reference; I'm told that the compressed source code at the end is not OSS.
simde_mm{,256,512}_2intersect_epi{32,64}
functions as shown in Listing 10, page 4; but please confirm that this is still faster than the fallback code on recent GCC/clang using a AVX512F system.simde_x_mm{,256,512}_2intersect_epi{32,64}_mask
functions with AVX512F and plain C implementations for returning only the first maskk1
(Listing 7, page 3)simde_x_mm{,256,512}_2interect_epi{32,64}_mask2
functions: versions of (B) when the second set of integer vectors is in-memory (but not loaded into a__m512i
register) (Listing 9, page 4) (some other name might be better)simde_x_mm{,256,512}_*_epi16
versions of all of the above for 16-bit vectors with AVX512F and plain C implementations.