llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.81k stars 11.91k forks source link

clang-tidy bugprone-signed-char-misuse triggers on int8_t to int cast #108943

Open davidrohr opened 1 month ago

davidrohr commented 1 month ago

When running the following code through clang-tidy with bugprone-signed-char-misuse

    int8_t y = 0;
    int32_t x = y;

it prints

[warning: 'signed char' to 'int32_t' (aka 'int') conversion; consider casting to 'unsigned char' first. [bugprone-signed-char-misuse]](javascript:;)
    7 |     int32_t x = y;

I think we should not print this warning when int8_t is used. The warning is to catch character to int conversions, but when int8_t is used, the user normally wants explicitly a signed integer.

In addition, I wonder actually why the check triggers on

    signed char y = 0;
    int32_t x = y;

Since here already I indicate that I want a signed char. AFAIK, char, signed char, unsigned char are 3 different types, and whether char becomes signed char or unsigned char is implementation-defined. IMHO, the check should only trigger on char, i.e. only for

    char y = 0;
    int32_t x = y;
FrankHB commented 3 weeks ago

Strong +1 for differentiating the case of signed char from char. The current rules effectively discourage the effort of avoiding the abuse of signed char to represent characters (instead of minimal-width signed integers), which is confusing. The type signed char is better avoided to represent characters normally because it has no traditional, conventional or idiomatic meaning about characters (and char should almost always be used instead). Each instance of using signed char as characters (e.g. converted implicitly from or to char values) may incur a question: is it really needed? This can derserve another dedicated check prior to the one here.

Additionally, the use of signed char representing integers should better be the normal case, even if not that usual. Whitelisting this use by types in CharTypdefsToIgnore is cumbersome and (argubly) not idiomatic, although there can be a few, say, FreeType's FT_Char. Note that in FreeType FT_Char is only used in its cache interface exactly for integers (not characters). FT_Char and FT_Byte here are confusing and inconsistent with other FreeTypes APIs.

Moreover, there can be support of well-known aliases of types like int8_t as the exceptional cases by default, instead of tweaking CharTypdefsToIgnore.