hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.23k stars 224 forks source link

[BUG] Error parsing UTF-8 character literal that is not a hex character #1142

Open bluetarpmedia opened 1 week ago

bluetarpmedia commented 1 week ago

Describe the bug cppfront produces an error when parsing a UTF-8 character literal (u8) which is not a hex character.

To Reproduce Run cppfront on this code:

main: () -> int = {

    a:= u8'a';  // ok
    b:= u8'b';  // ok
    c:= u8'c';  // ok
    d:= u8'd';  // ok
    e:= u8'e';  // ok
    f:= u8'f';  // ok
    g:= u8'g';  // error: line ended before character literal was terminated

    return 0;
}

Repro

sookach commented 1 week ago

Pardon my ignorance, but isn't u8 an unsigned 8 bit integer, not a utf-8 character literal?

bluetarpmedia commented 5 days ago

Yeah, Cpp2 has the type u8 (which lowers to cpp2::u8) but C++17 introduced the UTF-8 character literal so you can write u8'a'.

https://en.cppreference.com/w/cpp/language/character_literal

From my reading of the lexer, Cpp2 does support it: https://github.com/hsutter/cppfront/blob/a76e23b74f91ccec68336ddd5f84edb5b5216a7e/source/lex.h#L1190

hsutter commented 5 days ago

Thanks! I'll take a look.

I hadn't noticed that the literal prefix and the unsigned type alias used the same name. Interesting!