llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
26.71k stars 10.94k forks source link

[analyzer] EnumCastOutOfRange false-positive for bitmasks #48725

Open steakhal opened 3 years ago

steakhal commented 3 years ago
Bugzilla Link 49381
Version 11.0
OS All
CC @devincoughlin,@whisperity

Extended Description

The EnumCastOutOfRange checker emits a warning for this code:

// -std=c++17 -Xclang -analyze -Xclang -analyzer-checker=core,alpha.cplusplus.EnumCastOutOfRange -Xclang -analyzer-output=text https://godbolt.org/z/xosTxq

enum TestEnum { A = 0, B = 1, C = 2, D = 4, };

void bitmasks() { static_assert((B | C | D) == 7); TestEnum t = static_cast(7); // should not warn (void)t; }

The checker should recognize the bitmask and check the bounds accordingly. /CC Endre Fülöp, the co-author of the checker

whisperity commented 3 years ago

I guess, if you are right then this code is suspicious: https://github.com/llvm/llvm-project/blob/ 53298b8c8d74d0064c673db18addfe973b544dbb/clang/include/clang/StaticAnalyzer/ Core/PathSensitive/MemRegion.h#L1529-L1545

Indeed, I've fixated on a single rule, my bad. The problem is, you have to look at the law from above, and not just the individual case decisions.

Basically, it is true, that if you say an enum is between 0 and 4, then a type shall be selected that must be able to represent [0, 4]. However, the "unfortunate" reality is that the representation and requirements against the selected integral type (whether a standard one or an implementation-defined one) is that it must be able to represent -2^(N-1) to 2^N - 1. [http://eel.is/c++draft/basic.fundamental].

The other requirement is that if you would otherwise fit into an int, the implementation is prohibited from choosing a type larger than int (let's say uint4096_t or something like that).

So in the case of a required range [0, 4], we need at least 3 bits. The smallest possible integral type is char, which is at least 8 bits wide. However, because you've written "0", it will be signed value [http://eel.is/c++draft/dcl.enum#5.1]. Signed integrals take precedence in the type list over their unsigned counterparts.

So what the compiler is required to at least select as the implicit underlying type is "signed char", giving us the valid range [-128, 127]. And all of these values will be valid, even though the enum did not define it. (But not [0, 255]!)

I believe Clang should really be able to tell us what real range it selected for the enum's underlying type. Although I fear Clang is lazy, and even for such a simple enum of elements [0, 4], it will just go with "int". Which is wider than what's truly required.

Then just imagine that the underlying type is fixed. The issue is the same. This is a clear and easy bug in this checker.

Do you mean that even in the case of

enum POSIXACLBits : unsigned char

the checker still warns for the cast of 7 ?

steakhal commented 3 years ago

I guess, if you are right then this code is suspicious: https://github.com/llvm/llvm-project/blob/53298b8c8d74d0064c673db18addfe973b544dbb/clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h#L1529-L1545

steakhal commented 3 years ago

Then just imagine that the underlying type is fixed. The issue is the same. This is a clear and easy bug in this checker.

whisperity commented 3 years ago

I am not sure about C, but in C++, the TestEnum (with the value range [0, 4]) is NOT required to be able to represent "7".

As per http://eel.is/c++draft/dcl.enum#7

For an enumeration whose underlying type is not fixed, the underlying type is an integral type that can represent all the enumerator values defined in the enumeration. [...]

7 is not defined in the enumeration. So the compiler is not required to present you with an underlying integral type that might be able to represent 7. The only requirement is that up to 4 shall be representable.


As a side-note, if you consider the bitmask enum helper functions in LLVM's own codebase (include/llvm/ADT/BitmaskEnum.h), the idiomatic usage of bitmasks in LLVM's code is:

enum MyFlags {
  A = 0,
  B = 1,
  C = 2,
  D = 4,

  LLVM_MARK_AS_BITMASK_ENUM(/* LargestValue =*/ D)
};

LLVM_MARK_AS_BITMASK_ENUM expands to

LLVM_BITMASK_LARGEST_ENUMERATOR = LargestValue

I am not sure what the benefit of this extra declaration is from the user's PoV. As reading this "LARGEST_ENUMERATOR" will still give you the same value, 4 in our case. So it doesn't become the next power of 2, but there are some hint inside about calculating powers of 2.

llvmbot commented 3 years ago

Thanks for reaching out!

IMHO in C++ this is definitely the expected warning. I have found this as a motivating example with a valid workaround for combining bitmasks: https://stackoverflow.com/questions/4920783/combining-enum-value-using-bitmask

In C this could be debatable, and if there are indeed multiple real code examples of this sloppy enum use, then at least the language mode could be taken into account.

steakhal commented 3 years ago

assigned to @devincoughlin