llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.87k stars 11.92k forks source link

[clang or clang-tidy] -Wformat awareness for <cinttypes> #41959

Open LebedevRI opened 5 years ago

LebedevRI commented 5 years ago
Bugzilla Link 42614
Version trunk
OS Linux
CC @AaronBallman,@JonasToth,@gribozavr,@zygoloid,@rjmccall

Extended Description

printf()-style functions have a well-defined format string: https://en.cppreference.com/w/cpp/io/c/fprintf clang knows how to verify it (-Wformat),

However there is a huge pitfall hiding in plain sight. If one defines the variable e.g. as uint64_t: normally it is a 'unsigned long', so one will just use %lu - that is what clang recommends. But on different platform uint64_t can be 'unsigned long long', and -Wformat will complain that '%llu' should be used.

Neither of these is the "correct" solution - PRIu64 should be used instead. https://en.cppreference.com/w/cpp/header/cinttypes

Now, obviously the current -Wformat behavior isn't wrong - it does produce the correct results on the current platform - but they aren't great, since they don't catch (and actively advertise) platform-dependent format string.

While i expect it may be reasonably trivial to distinguish whether the printf() parameter is int or int32_t (e.g.), i'm honestly not sure how to deal with format string parsing - the current approach won't work, as it would need to be done before macro substitution.

thesamesam commented 2 years ago

FWIW, the GCC counterpart of this bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78014.

frederick-vs-ja commented 2 years ago

PRIu64 is a macro that expands to a string literal and is generally concatenated later. How can the compiler know whether it has been used? Is it even possible for string literals to carry such additional information?

I think the new length modifiers (in C23) added by WG14-N2680 can be helpful.