llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.59k stars 11.82k forks source link

[IA] .ascii directive rejects hex escape codes #42912

Open nickdesaulniers opened 5 years ago

nickdesaulniers commented 5 years ago
Bugzilla Link 43567
Version trunk
OS All
CC @jcai19,@isanbard,@kbeyls,@lalozano,@zygoloid,@stephenhines

Extended Description

$ cat arm.s .ascii "ARM\x64" $ clang arm.s arm.s:1:8: error: invalid escape sequence (unrecognized character) in '.ascii' directive .ascii "ARM\x64" ^ $ aarch64-linux-gnu-as arm.s
$ echo $? 0

It looks like the arm64 Linux kernel embeds a magic string in the kernel image. It is not a null terminated C style string (hence .ascii not .asciz). \x64 is a hexadecimal escape code (corresponds to 'd' in ASCII, but is meant to be a cute joke I think).

isanbard commented 5 years ago

https://reviews.llvm.org/D68483

isanbard commented 5 years ago

Ah! I see. GAS reads all of the hex digits and then truncates it to the lower 16 bits:

define CHAR_MASK (0xff)

... c = number & CHAR_MASK;

isanbard commented 5 years ago

In a string, it looks like the escaped hex sequence can be more than just 2 characters:

\x hex-digits... A hex character code. All trailing hex digits are combined. Either upper or lower case x works.

I'm not sure how to represent that as an "unsigned char", which is what it looks like is needed...

isanbard commented 5 years ago

It should be easy to add. In llvm/lib/MC/MCParser/AsmParser.cpp: bool AsmParser::parseEscapedString(std::string &Data) { ...

// Recognize escaped characters. Note that this escape semantics currently
// loosely follows Darwin 'as'. Notably, it doesn't support hex escapes.