llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.83k stars 11.46k forks source link

[Affine fusion] Uninitialized memory region is accessed after loops fusion #81601

Open m-ly4 opened 6 months ago

m-ly4 commented 6 months ago

IIUC, affine-loop-fusion pass doesn't take into account memory accesses in regions inside of non affine-for ops. In some cases it leads to removing operations that are necessary for another piece of code. Example (reduced to minimal reproducer from real code):

func.func @func() {
  %cst = arith.constant 1 : i32
  %alloc = memref.alloc() : memref<16xi32>
  affine.for %arg0 = 0 to 16 {
    affine.store %cst, %alloc[%arg0] : memref<16xi32>
  }
  affine.for %arg0 = 0 to 16 {
    %1 = affine.load %alloc[%arg0] : memref<16xi32>
  }
  %0 = arith.cmpi eq, %cst, %cst : i32
  scf.if %0 {
    affine.for %arg0 = 0 to 16 {
      %1 = affine.load %alloc[%arg0] : memref<16xi32>
    }
  }
  return
}

command: mlir-opt -pass-pipeline='builtin.module(func.func(affine-loop-fusion))' example.mlir Output:

module {
  func.func @func() {
    %alloc = memref.alloc() : memref<1xi32>
    %c1_i32 = arith.constant 1 : i32
    %alloc_0 = memref.alloc() : memref<16xi32>
    affine.for %arg0 = 0 to 16 {
      affine.store %c1_i32, %alloc[0] : memref<1xi32>
      %1 = affine.load %alloc[0] : memref<1xi32>
    }
    %0 = arith.cmpi eq, %c1_i32, %c1_i32 : i32
    scf.if %0 {
      affine.for %arg0 = 0 to 16 {
        %1 = affine.load %alloc_0[%arg0] : memref<16xi32>
      }
    }
    return
  }
}

So, alloc_0 becomes uninitialized, but still used by the code after fused loop.

Even if I use affine.if operation (like on example below) the behavior is the same:

#set0 = affine_set<(d0) : (1 == 0)>
func.func @func() {
  %cst = arith.constant 1 : i32
  %alloc = memref.alloc() : memref<16xi32>
  affine.for %arg0 = 0 to 16 {
    affine.store %cst, %alloc[%arg0] : memref<16xi32>
  }
  affine.for %arg0 = 0 to 16 {
    %1 = affine.load %alloc[%arg0] : memref<16xi32>
  }
  %0 = arith.index_cast %cst : i32 to index
  affine.if #set0(%0) {
    affine.for %arg0 = 0 to 16 {
      %1 = affine.load %alloc[%arg0] : memref<16xi32>
    }
  }
  return
}

hash of last llvm-project's commit: c230138011cbf07ad7caf9d256ae9d0c5032a974

llvmbot commented 6 months ago

@llvm/issue-subscribers-mlir-affine

Author: None (m-ly4)

IIUC, affine-loop-fusion pass doesn't take into account memory accesses in regions inside of non affine-for ops. In some cases it leads to removing operations that are necessary for another piece of code. Example (reduced to minimal reproducer from real code): ``` func.func @func() { %cst = arith.constant 1 : i32 %alloc = memref.alloc() : memref<16xi32> affine.for %arg0 = 0 to 16 { affine.store %cst, %alloc[%arg0] : memref<16xi32> } affine.for %arg0 = 0 to 16 { %1 = affine.load %alloc[%arg0] : memref<16xi32> } %0 = arith.cmpi eq, %cst, %cst : i32 scf.if %0 { affine.for %arg0 = 0 to 16 { %1 = affine.load %alloc[%arg0] : memref<16xi32> } } return } ``` command: `mlir-opt -pass-pipeline='builtin.module(func.func(affine-loop-fusion))' example.mlir` Output: ``` module { func.func @func() { %alloc = memref.alloc() : memref<1xi32> %c1_i32 = arith.constant 1 : i32 %alloc_0 = memref.alloc() : memref<16xi32> affine.for %arg0 = 0 to 16 { affine.store %c1_i32, %alloc[0] : memref<1xi32> %1 = affine.load %alloc[0] : memref<1xi32> } %0 = arith.cmpi eq, %c1_i32, %c1_i32 : i32 scf.if %0 { affine.for %arg0 = 0 to 16 { %1 = affine.load %alloc_0[%arg0] : memref<16xi32> } } return } } ``` So, `alloc_0` becomes uninitialized, but still used by the code after fused loop. Even if I use `affine.if` operation (like on example below) the behavior is the same: ``` #set0 = affine_set<(d0) : (1 == 0)> func.func @func() { %cst = arith.constant 1 : i32 %alloc = memref.alloc() : memref<16xi32> affine.for %arg0 = 0 to 16 { affine.store %cst, %alloc[%arg0] : memref<16xi32> } affine.for %arg0 = 0 to 16 { %1 = affine.load %alloc[%arg0] : memref<16xi32> } %0 = arith.index_cast %cst : i32 to index affine.if #set0(%0) { affine.for %arg0 = 0 to 16 { %1 = affine.load %alloc[%arg0] : memref<16xi32> } } return } ``` hash of last llvm-project's commit: c230138011cbf07ad7caf9d256ae9d0c5032a974
sgjzfzzf commented 2 months ago

Hi, I met a similar issue. Could you please share with me how you solved it finally?