Open nickdesaulniers opened 5 years ago
Ah! I see. GAS reads all of the hex digits and then truncates it to the lower 16 bits:
... c = number & CHAR_MASK;
In a string, it looks like the escaped hex sequence can be more than just 2 characters:
\x hex-digits... A hex character code. All trailing hex digits are combined. Either upper or lower case x works.
I'm not sure how to represent that as an "unsigned char", which is what it looks like is needed...
It should be easy to add. In llvm/lib/MC/MCParser/AsmParser.cpp: bool AsmParser::parseEscapedString(std::string &Data) { ...
// Recognize escaped characters. Note that this escape semantics currently
// loosely follows Darwin 'as'. Notably, it doesn't support hex escapes.
Extended Description
$ cat arm.s .ascii "ARM\x64" $ clang arm.s arm.s:1:8: error: invalid escape sequence (unrecognized character) in '.ascii' directive .ascii "ARM\x64" ^ $ aarch64-linux-gnu-as arm.s
$ echo $? 0
It looks like the arm64 Linux kernel embeds a magic string in the kernel image. It is not a null terminated C style string (hence .ascii not .asciz). \x64 is a hexadecimal escape code (corresponds to 'd' in ASCII, but is meant to be a cute joke I think).