lemire / SIMDCompressionAndIntersection

A C++ library to compress and intersect sorted lists of integers using SIMD instructions
Apache License 2.0
419 stars 58 forks source link

how to get the unaligned version to work? #10

Closed shlomiv closed 8 years ago

shlomiv commented 8 years ago

Hey,

Is there a simple way to get an unaligned codec? I see you have a usimdbitpacking.cpp/h, but doesnt seem like its easy to make a codec out of it..

By manually sed-ding load and store to _mm_loadu_si128 _mm_storeu_si128, I verified that on recent haswell architecture there is very little difference in performance (YMMV), and it makes sense to use unaligned access in my case.

Thanks for this really awesome library, Shlomi

lemire commented 8 years ago

@vadali

Yes, I agree that alignment should not matter much as far as performance is concerned. If someone has a machine where unaligned accesses are much slower, they should probably upgrade at this point.

Would you be willing to issue a PR? It is a simple matter of replacing all the load by loadu (or _mm_lddqu_si128) and store by storeu. One can do it with search and replace in seconds... the hard part is checking the everything still makes sense (and there are places where we might check for alignments, these checks need to be removed).

shlomiv commented 8 years ago

@lemire Sure, I already made the change you mentioned and tested it using testcodecs, which seemed to work just fine.

Before I issue a PR Ill turn this change into a macro, so it could be reverted by a simple #define, unless you think its unneeded.

lemire commented 8 years ago

@vadali A macro would be fine.