According to cppreference, identifiers are (XID_Start | '_') XID_Continue*, which is the case as of C++23 and C2x. I have confirmed this myself with the drafts of C++23 and C2x.
I have verified the performance impact of this change by timing running tree-sitter parse on all the c files in llvm-project before and after the change and the difference seems negligible to nonexistent; within 0.1 seconds of the previous total runtime of 6 ish seconds.
According to cppreference, identifiers are
(XID_Start | '_') XID_Continue*
, which is the case as of C++23 and C2x. I have confirmed this myself with the drafts of C++23 and C2x.https://en.cppreference.com/w/cpp/language/identifiers
Clang indeed implements identifiers as
(XID_Start | '_') XID_Continue*
in C++ mode and C2x mode, with a slight extension to the character set to include some extra math characters: https://github.com/llvm/llvm-project/blob/231992d9b88fe4e0b4aa0f55ed64d7ba88b231ce/clang/lib/Lex/Lexer.cpp#L1517-L1530I have verified the performance impact of this change by timing running tree-sitter parse on all the c files in llvm-project before and after the change and the difference seems negligible to nonexistent; within 0.1 seconds of the previous total runtime of 6 ish seconds.