llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.11k stars 11.61k forks source link

[libc] Make char parsing code encoding independent #109841

Open michaelrj-google opened 2 hours ago

michaelrj-google commented 2 hours ago

What: Some of our string parsing code assumes ASCII, for example the string to integer code: https://github.com/llvm/llvm-project/blob/main/libc/src/__support/str_to_integer.h#L37 We should move this to be encoding independant, likely using switch statements.

Why change it: Easier support for non-ASCII character encodings (e.g. wide character, EBCDIC).

Will be bad for performance: No, it might actually be better. Clang is very good at optimizing this sort of switch statement: https://godbolt.org/z/qvrebqvvr

llvmbot commented 2 hours ago

@llvm/issue-subscribers-libc

Author: Michael Jones (michaelrj-google)

What: Some of our string parsing code assumes ASCII, for example the string to integer code: https://github.com/llvm/llvm-project/blob/main/libc/src/__support/str_to_integer.h#L37 We should move this to be encoding independant, likely using switch statements. Why change it: Easier support for non-ASCII character encodings (e.g. wide character, EBCDIC). Will be bad for performance: No, it might actually be better. Clang is very good at optimizing this sort of switch statement: https://godbolt.org/z/qvrebqvvr