stb_c_lexer: Adding support for octal and hexadecimal chars

danil-kondr2016 commented 1 year ago

This commit adds support of octal and hexadecimal codes of characters in string and char literals.

nothings commented 1 year ago

if I'm reading the diffs right, there are multiple problems that should be straightforward to fix

in octal parsing, the first if() statement doesn't update *q, so a one-character octal value won't be parsed properly inside a string (e.g. "\1foo"), which actually breaks the previously-supported case of \0
in octal parsing, the if()s aren't nested so digits later in a string can mess it up (e.g. try "\1f2")
in hex parsing, the if()s aren't nested so digits later in a string can mess it up (e.g. try "\x1g2")
hex character size is implementation-defined in C, so parsing isn't limited to 3 characters. should just do a full-sized hex parse, error if it overflows. First check for overflow the variable you're computing as you go, and then check again if the final value fits in the literal used; for character literals that's long int_number and for string literals it's sizeof(char) so just check the result fits in 8-bits (but use the full parse). Note that this overflow checking is NOT done elsewhere in stb_c_lexer, but we should improve that, so this is a good place to start.

style improvements:

should only write to *q once at end for clarity in both octal and hex constants, might be clearest by advancing p as you read and then doing *q = p.
although it's inconsistent with the value types, should probably parse the initial hex value in an unsigned type, since that's a more natural fit for hex constants

danil-kondr2016 commented 1 year ago

Thank you for feedback. I'll fix these errors.

nothings commented 1 year ago

Note that it looks like STB_C_LEXER_SELF_TEST just prints back the input string for string literals and char literals, which means you can't tell from that test that they're computing the wrong value in those cases. If the end location of a char literal is misparsed, that may or may not be recognized, but getting the end of the char literal wrong inside of a string literal will not be visible as well. So you really need to test char literals, and maybe modify the output in print_token to print the value of int_number in parenthesies, i.e. case CLEX_charlit : printf("'%s'(%d)", lexer->string, lexer->int_number);

danil-kondr2016 commented 1 year ago

I have fixed errors which you told about.

nothings / stb

stb_c_lexer: Adding support for octal and hexadecimal chars #1380