nothings / stb

stb single-file public domain libraries for C/C++
https://twitter.com/nothings
Other
25.83k stars 7.66k forks source link

Fix AVX and AVX2 corruption due to undefined upper 128 bits after cal… #1576

Closed popizdeh closed 7 months ago

popizdeh commented 7 months ago

When stbir__1_coeff_remnant and stbir__store_output are expanded the following happens:

stbir__simdf8_madd_mem4( tot0, tot0, t, ); will corrupt upper 128 bits of tot0 because of _mm256_castps128_ps256 stbir__simdf8_add4halves( t, t, tot0 ); will use those bits via _mm256_extractf128_ps( tot0, 1 )

I was observing image corruption when scaling by 1/9 factor but it's very difficult to reproduce, I'm guessing this is because the compiler inserts VZEROUPPER at various places so upper 128 bits get set to zero most of the time but in some situations I get random values in the upper half of the register.

I haven't seen the bug when enabling FMA but I've applied the fix as the code suffers from the same issue. This goes for stbir__simdf8_add4 too, no problem was observed but that's the only other place where _mm256_castps128_ps256 is used.