Closed shlomiv closed 8 years ago
@vadali
Yes, I agree that alignment should not matter much as far as performance is concerned. If someone has a machine where unaligned accesses are much slower, they should probably upgrade at this point.
Would you be willing to issue a PR? It is a simple matter of replacing all the load by loadu (or _mm_lddqu_si128) and store by storeu. One can do it with search and replace in seconds... the hard part is checking the everything still makes sense (and there are places where we might check for alignments, these checks need to be removed).
@lemire Sure, I already made the change you mentioned and tested it using testcodecs, which seemed to work just fine.
Before I issue a PR Ill turn this change into a macro, so it could be reverted by a simple #define, unless you think its unneeded.
@vadali A macro would be fine.
Hey,
Is there a simple way to get an unaligned codec? I see you have a usimdbitpacking.cpp/h, but doesnt seem like its easy to make a codec out of it..
By manually sed-ding load and store to _mm_loadu_si128 _mm_storeu_si128, I verified that on recent haswell architecture there is very little difference in performance (YMMV), and it makes sense to use unaligned access in my case.
Thanks for this really awesome library, Shlomi