Open thrasibule opened 4 years ago
I've made some progress on this.
Thanks for letting us know and sharing your results. It does sound like a compiler bug, which is unfortunately not uncommon in my experience.
avx2 was because I was running it in virtuabox, and even though virtualbox lets avx2 flag thought the host, it doesn't let fma and bmi throughwhich are also required.
Yes, this is unfortunate. We also ran into that with JPEG XL. There the FMA is helpful but for HighwayHash we could remove those extra flag requirements.
Interesting that -O3 is not the same as its constituent flags (at least as defined at https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html - I verified it's the same list). Would you like to report this as a potential a bug to GCC?
I've checked that it also works with -O3 -fno-strict-aliasing, but I don't get any aliasing warnings, so it must be something quite subtle. Does it ring a bell? It also works with gcc 9.3. I'm a bit wary to report it to gcc without narrowing down the issue further.
The test fail for me on my machine with:
Mismatch at size 33 for target Portable.
This is with gcc 10.2, see full version:
If I compile with clang, the test passes, but only tries SSE41, not AVX2 even though my cpu supports it.
Any idea how to debug this?