llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.3k stars 11.68k forks source link

[flang] TBAA information is incorrect after LLVM inlining #89829

Open vzakhari opened 5 months ago

vzakhari commented 5 months ago

The per function TBAA information created for the Fortran dummy arguments may become invalid after LLVM inlining. I am not sure if we have existing issue for this.

I started from the following Fortran source:

subroutine test(x, y, i)
  real, target :: x(4), y(4)
  integer, value :: i
  call inner(x, y)
  call inner(y, x(i:i))
contains
  subroutine inner(x, y)
    real :: x(:), y(:)
    x(1:4) = y(1:4)
  end subroutine inner
end subroutine test

program main
  interface
     subroutine test(x, y, i)
       real, target :: x(4), y(4)
       integer, value :: i
     end subroutine test
  end interface
  real :: x(4), y(4)
  x = 1.0
  y = 2.0
  call test(x, y, 1)
  print *, x
  print *, y
end program main

repro1.ll.gz - LLVM IR on entry to LLVM produced by flang-new -Ofast alias.f90 -c -march=skylake with an added noinline attribute for test_.

repro2.ll.gz - modified LLVM IR to demonstrate the problem.

After inlining and other optimization passes applied to repro1.ll the LLVM IR for test_ looks like this:

; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite)
define void @test_(ptr nocapture %0, ptr nocapture %1, i32 %2) local_unnamed_addr #0 {
  %4 = load <4 x float>, ptr %1, align 4, !tbaa !3
  store <4 x float> %4, ptr %0, align 4, !tbaa !9
  %5 = sext i32 %2 to i64
  %6 = add nsw i64 %5, -1
  %7 = getelementptr [4 x float], ptr %0, i64 0, i64 %6
  %8 = load <4 x float>, ptr %7, align 4, !tbaa !3
  store <4 x float> %8, ptr %1, align 4, !tbaa !9
  ret void
}
!3 = !{!4, !4, i64 0}
!4 = !{!"dummy arg data/_QFtestFinnerEy", !5, i64 0}
!5 = !{!"dummy arg data", !6, i64 0}
!6 = !{!"any data access", !7, i64 0}
!7 = !{!"any access", !8, i64 0}
!8 = !{!"Flang function root _QFtestPinner"}
!9 = !{!10, !10, i64 0}
!10 = !{!"dummy arg data/_QFtestFinnerEx", !5, i64 0}

TBAA tags !3 and !9 indicate no aliasing for the instructions they are attached to. This is obviously incorrect for store %4 and %8. I was not able to write a test that would force LLVM to incorrectly reorder the store and the load, so did it manually in repro2.ll. The incorrect behavior may be seen with: flang-new -O0 repro2.ll; ./a.out:

 2. 2. 2. 2.
 1. 1. 1. 1.

Expected result (flang-new -O0 repro1.ll; ./a.out):

 2. 2. 2. 2.
 2. 2. 2. 2.

@tblah, FYI

llvmbot commented 5 months ago

@llvm/issue-subscribers-flang-ir

Author: Slava Zakharin (vzakhari)

The per function TBAA information created for the Fortran dummy arguments may become invalid after LLVM inlining. I am not sure if we have existing issue for this. I started from the following Fortran source: ``` subroutine test(x, y, i) real, target :: x(4), y(4) integer, value :: i call inner(x, y) call inner(y, x(i:i)) contains subroutine inner(x, y) real :: x(:), y(:) x(1:4) = y(1:4) end subroutine inner end subroutine test program main interface subroutine test(x, y, i) real, target :: x(4), y(4) integer, value :: i end subroutine test end interface real :: x(4), y(4) x = 1.0 y = 2.0 call test(x, y, 1) print *, x print *, y end program main ``` [repro1.ll.gz](https://github.com/llvm/llvm-project/files/15082727/repro1.ll.gz) - LLVM IR on entry to LLVM produced by `flang-new -Ofast alias.f90 -c -march=skylake` with an added `noinline` attribute for `test_`. [repro2.ll.gz](https://github.com/llvm/llvm-project/files/15082729/repro2.ll.gz) - modified LLVM IR to demonstrate the problem. After inlining and other optimization passes applied to `repro1.ll` the LLVM IR for `test_` looks like this: ``` ; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) define void @test_(ptr nocapture %0, ptr nocapture %1, i32 %2) local_unnamed_addr #0 { %4 = load <4 x float>, ptr %1, align 4, !tbaa !3 store <4 x float> %4, ptr %0, align 4, !tbaa !9 %5 = sext i32 %2 to i64 %6 = add nsw i64 %5, -1 %7 = getelementptr [4 x float], ptr %0, i64 0, i64 %6 %8 = load <4 x float>, ptr %7, align 4, !tbaa !3 store <4 x float> %8, ptr %1, align 4, !tbaa !9 ret void } !3 = !{!4, !4, i64 0} !4 = !{!"dummy arg data/_QFtestFinnerEy", !5, i64 0} !5 = !{!"dummy arg data", !6, i64 0} !6 = !{!"any data access", !7, i64 0} !7 = !{!"any access", !8, i64 0} !8 = !{!"Flang function root _QFtestPinner"} !9 = !{!10, !10, i64 0} !10 = !{!"dummy arg data/_QFtestFinnerEx", !5, i64 0} ``` TBAA tags `!3` and `!9` indicate no aliasing for the instructions they are attached to. This is obviously incorrect for `store %4` and `%8`. I was not able to write a test that would force LLVM to incorrectly reorder the store and the load, so did it manually in `repro2.ll`. The incorrect behavior may be seen with: `flang-new -O0 repro2.ll; ./a.out`: ``` 2. 2. 2. 2. 1. 1. 1. 1. ``` Expected result (`flang-new -O0 repro1.ll; ./a.out`): ``` 2. 2. 2. 2. 2. 2. 2. 2. ``` @tblah, FYI
llvmbot commented 5 months ago

@llvm/issue-subscribers-bug

Author: Slava Zakharin (vzakhari)

The per function TBAA information created for the Fortran dummy arguments may become invalid after LLVM inlining. I am not sure if we have existing issue for this. I started from the following Fortran source: ``` subroutine test(x, y, i) real, target :: x(4), y(4) integer, value :: i call inner(x, y) call inner(y, x(i:i)) contains subroutine inner(x, y) real :: x(:), y(:) x(1:4) = y(1:4) end subroutine inner end subroutine test program main interface subroutine test(x, y, i) real, target :: x(4), y(4) integer, value :: i end subroutine test end interface real :: x(4), y(4) x = 1.0 y = 2.0 call test(x, y, 1) print *, x print *, y end program main ``` [repro1.ll.gz](https://github.com/llvm/llvm-project/files/15082727/repro1.ll.gz) - LLVM IR on entry to LLVM produced by `flang-new -Ofast alias.f90 -c -march=skylake` with an added `noinline` attribute for `test_`. [repro2.ll.gz](https://github.com/llvm/llvm-project/files/15082729/repro2.ll.gz) - modified LLVM IR to demonstrate the problem. After inlining and other optimization passes applied to `repro1.ll` the LLVM IR for `test_` looks like this: ``` ; Function Attrs: mustprogress nofree noinline norecurse nosync nounwind willreturn memory(argmem: readwrite) define void @test_(ptr nocapture %0, ptr nocapture %1, i32 %2) local_unnamed_addr #0 { %4 = load <4 x float>, ptr %1, align 4, !tbaa !3 store <4 x float> %4, ptr %0, align 4, !tbaa !9 %5 = sext i32 %2 to i64 %6 = add nsw i64 %5, -1 %7 = getelementptr [4 x float], ptr %0, i64 0, i64 %6 %8 = load <4 x float>, ptr %7, align 4, !tbaa !3 store <4 x float> %8, ptr %1, align 4, !tbaa !9 ret void } !3 = !{!4, !4, i64 0} !4 = !{!"dummy arg data/_QFtestFinnerEy", !5, i64 0} !5 = !{!"dummy arg data", !6, i64 0} !6 = !{!"any data access", !7, i64 0} !7 = !{!"any access", !8, i64 0} !8 = !{!"Flang function root _QFtestPinner"} !9 = !{!10, !10, i64 0} !10 = !{!"dummy arg data/_QFtestFinnerEx", !5, i64 0} ``` TBAA tags `!3` and `!9` indicate no aliasing for the instructions they are attached to. This is obviously incorrect for `store %4` and `%8`. I was not able to write a test that would force LLVM to incorrectly reorder the store and the load, so did it manually in `repro2.ll`. The incorrect behavior may be seen with: `flang-new -O0 repro2.ll; ./a.out`: ``` 2. 2. 2. 2. 1. 1. 1. 1. ``` Expected result (`flang-new -O0 repro1.ll; ./a.out`): ``` 2. 2. 2. 2. 2. 2. 2. 2. ``` @tblah, FYI
tblah commented 5 months ago

Thanks for pointing this out. I wonder if this is why enabling local alloca tbaa led to bugs.

My takeaway from this is that "per-function" tbaa trees wasn't quite right. They need to be (somehow) different for each call. Does that sound right to you?

tblah commented 5 months ago

This is also a problem in classic flang

$ $CLASSIC_FLANG -Ofast tbaa.f90 -o tbaa
$ ./tbaa
    2.000000        2.000000        2.000000        2.000000
    1.000000        1.000000        1.000000        1.000000
$ $CLASSIC_FLANG -O0 tbaa.f90 -o tbaa
$ ./tbaa
    2.000000        2.000000        2.000000        2.000000
    2.000000        2.000000        2.000000        2.000000
vzakhari commented 5 months ago

Thank you for checking it with the classic flang, Tom!

Basically, TBAA is supposed to be insensitive to any code transformations, since it represents the language aliasing rules based on the variables' data types. So LLVM inlining is not supposed to worry about updating the TBAA metadata, and we are in trouble with our usage of TBAA, because the different call sites end up using TBAA tree with the same root.

I think we can only make it right with the Full Restrict support (though I haven't heard recent updates on it): https://discourse.llvm.org/t/full-restrict-support-status-update/53514

tblah commented 5 months ago

Yeah I agree full restrict will be much better once it arrives.