llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.49k stars 11.78k forks source link

[flang][hlfir] SPEC CPU2006/437.leslie3d 5% performance regression #65413

Closed vzakhari closed 1 year ago

vzakhari commented 1 year ago

The benchmark runs 5% slower than with FIR lowering on icelake (120.5 seconds vs 114.5).

The slowdown is related to extra temporaries created for some assignments, e.g.:

3,550         SUBROUTINE UPDATE()               
...
3,567                   
3,568         DO K = 1, KMAX - 1                
3,569            DO J = 1, JMAX - 1             
3,570                   
3,571               Q(1:I2,J,K,1,M) = (RNM1 * Q(1:I2,J,K,1,M) +
3,572        >                         Q(1:I2,J,K,1,N) + DU(1:I2,J,K,1)) * RNI              

Q and DU are module ALLOCATABLE variables; RNM1 and RNI are local scalars . Due to the box loads involved in producing the base memrefs of the designators, the alias analysis cannot currently prove no-aliasing for Q and DU accesses. This blocks the optimized bufferization pass from eliminating the temporary:

  %42:2 = hlfir.declare %41 {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QMles3d_dataEdu"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>)
...
  %199:2 = hlfir.declare %198 {fortran_attrs = #fir.var_attrs<allocatable>, uniq_name = "_QMles3d_dataEq"} : (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?x?xf64>>>>) -> (!fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?x?xf64>>>>, !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?x?xf64>>>>)
...
      %339 = fir.load %199#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?x?xf64>>>>
      %340 = fir.load %82#0 : !fir.ref<i32>
      %341 = fir.convert %340 : (i32) -> index
      %342 = arith.cmpi sgt, %341, %c0 : index
      %343 = arith.select %342, %341, %c0 : index
      %344 = fir.load %121#0 : !fir.ref<i32>
      %345 = fir.convert %344 : (i32) -> i64
      %346 = fir.load %137#0 : !fir.ref<i32>
      %347 = fir.convert %346 : (i32) -> i64
      %348 = fir.load %157#0 : !fir.ref<i32>
      %349 = fir.convert %348 : (i32) -> i64
      %350 = fir.shape %343 : (index) -> !fir.shape<1>
      %351 = hlfir.designate %339 (%c1:%341:%c1, %345, %347, %c1, %349)  shape %350 : (!fir.box<!fir.heap<!fir.array<?x?x?x?x?xf64>>>, index, index, index, i64, i64, index, i64, !fir.shape<1>) -> !fir.box<!fir.array<?xf64>>
      %352 = fir.load %159#0 : !fir.ref<i32>
      %353 = fir.convert %352 : (i32) -> i64
      %354 = hlfir.designate %339 (%c1:%341:%c1, %345, %347, %c1, %353)  shape %350 : (!fir.box<!fir.heap<!fir.array<?x?x?x?x?xf64>>>, index, index, index, i64, i64, index, i64, !fir.shape<1>) -> !fir.box<!fir.array<?xf64>>
      %355 = fir.load %42#0 : !fir.ref<!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>>
      %356 = hlfir.designate %355 (%c1:%341:%c1, %345, %347, %c1)  shape %350 : (!fir.box<!fir.heap<!fir.array<?x?x?x?xf64>>>, index, index, index, i64, i64, index, !fir.shape<1>) -> !fir.box<!fir.array<?xf64>>
      %357 = fir.load %226#0 : !fir.ref<f64>
      %358 = hlfir.elemental %350 unordered : (!fir.shape<1>) -> !hlfir.expr<?xf64> {
      ^bb0(%arg4: index):
        %568 = hlfir.designate %351 (%arg4)  : (!fir.box<!fir.array<?xf64>>, index) -> !fir.ref<f64>
        %569 = fir.load %568 : !fir.ref<f64>
        %570 = arith.mulf %338, %569 fastmath<fast> : f64
        %571 = hlfir.designate %354 (%arg4)  : (!fir.box<!fir.array<?xf64>>, index) -> !fir.ref<f64>
        %572 = fir.load %571 : !fir.ref<f64>
        %573 = arith.addf %570, %572 fastmath<fast> : f64
        %574 = hlfir.designate %356 (%arg4)  : (!fir.box<!fir.array<?xf64>>, index) -> !fir.ref<f64>
        %575 = fir.load %574 : !fir.ref<f64>
        %576 = arith.addf %573, %575 fastmath<fast> : f64
        %577 = hlfir.no_reassoc %576 : f64
        %578 = arith.mulf %577, %357 fastmath<fast> : f64
        hlfir.yield_element %578 : f64
      }
      hlfir.assign %358 to %351 : !hlfir.expr<?xf64>, !fir.box<!fir.array<?xf64>>
vzakhari commented 1 year ago

Fixed by https://github.com/llvm/llvm-project/pull/67353