Closed gegogi closed 3 years ago
I am just writing a new comment to check if this is a bug or not. Is Microsoft still maitaining the code?
Sorry, I missed this bug report. I'll take a look at it for a future release.
Same problem exists in load functions for XMUNIBBLE4, XMU555, XMU565, XMBYTEN2, XMBYTE2, XMUBYTEN2, and XMUBYTE2
Note the _mm_loadu_si16
intrinsic is the right choice here, but it's only defined in VS 2017, clang v8, and GNUC 11 or later This will cause problems with GNUC 9/10 scenarios on WSL.
Addressed the issue with GNUC 9, 10 in this commit
Functions are using _mm_load_ps1(const float*) for SSE implementation. But at the end of a memory block, this can access over as much as two bytes since XMUNIBBLE4 and XMU555 are packed types.
I bumped into a crash while converting a tightly packed RGBA4444 image to a RGBA8888 image using DirectXTex and it looks like it's happening while loading the final scanline of the source image. I ended up with reaching this SSE code.