The load functions for the following packed types all read 32-bits even though the data types 16-bits. It mostly works, except at the end of a tight buffer.
XMUNIBBLE4
XMU555
XMU565
XMBYTEN2
XMBYTE2
XMUBYTEN2
XMUBYTE2
I changed from using _mm_load1_ps to _mm_loadu_si16. This is still SSEv1. Note the Store functions never had this problem.
The load functions for the following packed types all read 32-bits even though the data types 16-bits. It mostly works, except at the end of a tight buffer.