llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.05k stars 11.58k forks source link

Bad alignment in convertUTF16ToUTF8String #18856

Open 1ba3d143-a64b-4671-82b2-0b31cfb91709 opened 10 years ago

1ba3d143-a64b-4671-82b2-0b31cfb91709 commented 10 years ago
Bugzilla Link 18482
Version trunk
OS All
CC @rnk

Extended Description

lib/Support/ConvertUTFWrapper.cpp contains this code:

bool convertUTF16ToUTF8String(ArrayRef SrcBytes, std::string &Out) { ... const UTF16 Src = reinterpret_cast<const UTF16 >(SrcBytes.begin()); const UTF16 SrcEnd = reinterpret_cast<const UTF16 >(SrcBytes.end()); ... if (Src[0] == UNI_UTF16_BYTE_ORDER_MARK_SWAPPED) {

The UTF16 type is normally 2-byte aligned, and there is no guarantee that the ArrayRef points to aligned bytes.

This crashes the unit test ConvertUTFTest.ConvertUTF16LittleEndianToUTF8String on SPARC.

1ba3d143-a64b-4671-82b2-0b31cfb91709 commented 10 years ago

The code generator does the right thing if you give it an unaligned IR load. You can use a 2-byte memcpy or a pointer to a packed struct with a UTF16 member.

You can probable;y reproduce on Intel if you compile with -fsanitize=alignment and make sure the test case is passing an unaligned string.

rnk commented 10 years ago

Yep. What's the usual way to do unaligned loads for something on SPARC?