llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.83k stars 11.91k forks source link

[analyzer] non-active union member accesses are not modeled #69847

Open tianxinghe opened 1 year ago

tianxinghe commented 1 year ago

For the following source code:

  union u
  {
      int First;
      int Second;
  };

  int main() {
    int data;
    data = 0;
    union u n;
    n.First = data;
    int result = (int)(100 / n.Second);
    return 0;
  }

In Clang's abstract syntax tree, the member variables First and Second of the union are considered as two variables without an alias relationship. However, after generating llvm ir, they are aliases.

Clang ast: image The second binop assigns 0 to n.First, and because Clang's abstract syntax tree does not consider them aliases, CSA treats n.Second as unknown when processing the third binop.

llvm ir: image Lines 17 and 19 operate on the same memory area.

I think this may be causing some accuracy issues on the union. @steakhal @haoNoQ @EugeneZelenko

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-frontend

Author: Tianxing He (tianxinghe)

For the following source code: union u { int First; int Second; }; int main() { int data; data = 0; union u n; n.First = data; int result = (int)(100 / n.Second); return 0; } In Clang's abstract syntax tree, the member variables First and Second of the union are considered as two variables without an alias relationship. However, after generating llvm ir, they are aliases. Clang ast: ![image](https://github.com/llvm/llvm-project/assets/26410605/a909733f-72cb-4980-bc3f-2de3a997abdb) The second binop assigns 0 to n.First, and because Clang's abstract syntax tree does not consider them aliases, CSA treats n.Second as unknown when processing the third binop. llvm ir: ![image](https://github.com/llvm/llvm-project/assets/26410605/0c1f66d8-72e5-4b8a-aa64-38648ad4f035) Lines 17 and 19 operate on the same memory area. I think this may be causing some accuracy issues on the union. @steakhal @haoNoQ @EugeneZelenko
tbaederr commented 1 year ago

Writing to First and reading from Second is UB

tianxinghe commented 1 year ago

Writing to First and reading from Second is UB

Thank you! Can it cause pointer-related (e.g. npd/uaf/ml) problems in certain situations? I'd like to check it in csa.

steakhal commented 1 year ago

AFAIK unions blessed by clang and promises not to exploit that UB. CSA is not a verification tool, and unions there are basically not implemented as per how hardware works, and only the active member will be tracked.

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-static-analyzer

Author: Tianxing He (tianxinghe)

For the following source code: union u { int First; int Second; }; int main() { int data; data = 0; union u n; n.First = data; int result = (int)(100 / n.Second); return 0; } In Clang's abstract syntax tree, the member variables First and Second of the union are considered as two variables without an alias relationship. However, after generating llvm ir, they are aliases. Clang ast: ![image](https://github.com/llvm/llvm-project/assets/26410605/a909733f-72cb-4980-bc3f-2de3a997abdb) The second binop assigns 0 to n.First, and because Clang's abstract syntax tree does not consider them aliases, CSA treats n.Second as unknown when processing the third binop. llvm ir: ![image](https://github.com/llvm/llvm-project/assets/26410605/0c1f66d8-72e5-4b8a-aa64-38648ad4f035) Lines 17 and 19 operate on the same memory area. I think this may be causing some accuracy issues on the union. @steakhal @haoNoQ @EugeneZelenko
tianxinghe commented 1 year ago

AFAIK unions blessed by clang and promises not to exploit that UB. CSA is not a verification tool, and unions there are basically not implemented as per how hardware works, and only the active member will be tracked.

I get it. Thank you!