Open 1ba3d143-a64b-4671-82b2-0b31cfb91709 opened 10 years ago
The code generator does the right thing if you give it an unaligned IR load. You can use a 2-byte memcpy or a pointer to a packed struct with a UTF16 member.
You can probable;y reproduce on Intel if you compile with -fsanitize=alignment and make sure the test case is passing an unaligned string.
Yep. What's the usual way to do unaligned loads for something on SPARC?
Extended Description
lib/Support/ConvertUTFWrapper.cpp contains this code:
bool convertUTF16ToUTF8String(ArrayRef SrcBytes, std::string &Out) {
...
const UTF16 Src = reinterpret_cast<const UTF16 >(SrcBytes.begin());
const UTF16 SrcEnd = reinterpret_cast<const UTF16 >(SrcBytes.end());
...
if (Src[0] == UNI_UTF16_BYTE_ORDER_MARK_SWAPPED) {
The UTF16 type is normally 2-byte aligned, and there is no guarantee that the ArrayRef points to aligned bytes.
This crashes the unit test ConvertUTFTest.ConvertUTF16LittleEndianToUTF8String on SPARC.