Open vzakhari opened 5 months ago
@llvm/issue-subscribers-flang-ir
Author: Slava Zakharin (vzakhari)
@llvm/issue-subscribers-bug
Author: Slava Zakharin (vzakhari)
Thanks for pointing this out. I wonder if this is why enabling local alloca tbaa led to bugs.
My takeaway from this is that "per-function" tbaa trees wasn't quite right. They need to be (somehow) different for each call. Does that sound right to you?
This is also a problem in classic flang
$ $CLASSIC_FLANG -Ofast tbaa.f90 -o tbaa
$ ./tbaa
2.000000 2.000000 2.000000 2.000000
1.000000 1.000000 1.000000 1.000000
$ $CLASSIC_FLANG -O0 tbaa.f90 -o tbaa
$ ./tbaa
2.000000 2.000000 2.000000 2.000000
2.000000 2.000000 2.000000 2.000000
Thank you for checking it with the classic flang, Tom!
Basically, TBAA is supposed to be insensitive to any code transformations, since it represents the language aliasing rules based on the variables' data types. So LLVM inlining is not supposed to worry about updating the TBAA metadata, and we are in trouble with our usage of TBAA, because the different call sites end up using TBAA tree with the same root.
I think we can only make it right with the Full Restrict support (though I haven't heard recent updates on it): https://discourse.llvm.org/t/full-restrict-support-status-update/53514
Yeah I agree full restrict will be much better once it arrives.
The per function TBAA information created for the Fortran dummy arguments may become invalid after LLVM inlining. I am not sure if we have existing issue for this.
I started from the following Fortran source:
repro1.ll.gz - LLVM IR on entry to LLVM produced by
flang-new -Ofast alias.f90 -c -march=skylake
with an addednoinline
attribute fortest_
.repro2.ll.gz - modified LLVM IR to demonstrate the problem.
After inlining and other optimization passes applied to
repro1.ll
the LLVM IR fortest_
looks like this:TBAA tags
!3
and!9
indicate no aliasing for the instructions they are attached to. This is obviously incorrect forstore %4
and%8
. I was not able to write a test that would force LLVM to incorrectly reorder the store and the load, so did it manually inrepro2.ll
. The incorrect behavior may be seen with:flang-new -O0 repro2.ll; ./a.out
:Expected result (
flang-new -O0 repro1.ll; ./a.out
):@tblah, FYI