Inconsistent results when using --buffer-deallocation

wangyongj1a commented 5 months ago

I have the following MLIR program: test.mlir:

module {
  func.func @func0(%arg0: tensor<19xi32>) -> (){
    %0 = arith.constant 2 : index
    %1 = tensor.extract %arg0[%0] : tensor<19xi32>
    vector.print %1 : i32
    return
  }
  func.func private @func1() {
    %0 = arith.constant 10 : i32
    %1 = tensor.from_elements %0 : tensor<1xi32>
    %2 = tensor.from_elements %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0 : tensor<19xi32>
    call @func0(%2) : (tensor<19xi32>) -> ()
    return
  }
}

When I tried to lower the program with mlir-opt --tensor-bufferize --buffer-deallocation --convert-scf-to-cf --convert-cf-to-llvm --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts test.mlir, and executed the executable file, I got inconsistent results over multiple runs. I noticed that after using the passes --tensor-bufferize --buffer-deallocation, the program was lowered to:

module {
  func.func @func0(%arg0: tensor<19xi32>) {
    %0 = bufferization.to_memref %arg0 : memref<19xi32>
    %c2 = arith.constant 2 : index
    %1 = memref.load %0[%c2] : memref<19xi32>
    vector.print %1 : i32
    return
  }
  func.func private @func1() {
    %c10_i32 = arith.constant 10 : i32
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
    %c0 = arith.constant 0 : index
    memref.store %c10_i32, %alloc[%c0] : memref<1xi32>
    memref.dealloc %alloc : memref<1xi32>
    %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<19xi32>
    %c0_1 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %c3 = arith.constant 3 : index
    %c4 = arith.constant 4 : index
    %c5 = arith.constant 5 : index
    %c6 = arith.constant 6 : index
    %c7 = arith.constant 7 : index
    %c8 = arith.constant 8 : index
    %c9 = arith.constant 9 : index
    %c10 = arith.constant 10 : index
    %c11 = arith.constant 11 : index
    %c12 = arith.constant 12 : index
    %c13 = arith.constant 13 : index
    %c14 = arith.constant 14 : index
    %c15 = arith.constant 15 : index
    %c16 = arith.constant 16 : index
    %c17 = arith.constant 17 : index
    %c18 = arith.constant 18 : index
    memref.store %c10_i32, %alloc_0[%c0_1] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c1] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c2] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c3] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c4] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c5] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c6] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c7] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c8] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c9] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c10] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c11] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c12] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c13] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c14] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c15] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c16] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c17] : memref<19xi32>
    memref.store %c10_i32, %alloc_0[%c18] : memref<19xi32>
    %0 = bufferization.to_tensor %alloc_0 : memref<19xi32>
    memref.dealloc %alloc_0 : memref<19xi32>
    call @func0(%0) : (tensor<19xi32>) -> ()
    return
  }
}

The memory of %alloc_0 seems to be deallocated before the related tensor %0 was used as an attribute in the function call. I also tried to use the pass --buffer-deallocation-simplification after --buffer-deallocation, but it seems couldn't help with this case. I'm not sure if there is any bug in my program or the wrong usage of --buffer-deallocation and --buffer-deallocation-simplification that caused this error. My git version is 4c79d38f82e1f6fe8575d88d8c74f2f1806b19ce.

llvmbot commented 5 months ago

@llvm/issue-subscribers-mlir

Author: None (wangyongj1a)

I have the following MLIR program: test.mlir: ``` module { func.func @func0(%arg0: tensor<19xi32>) -> (){ %0 = arith.constant 2 : index %1 = tensor.extract %arg0[%0] : tensor<19xi32> vector.print %1 : i32 return } func.func private @func1() { %0 = arith.constant 10 : i32 %1 = tensor.from_elements %0 : tensor<1xi32> %2 = tensor.from_elements %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0 : tensor<19xi32> call @func0(%2) : (tensor<19xi32>) -> () return } } ``` When I tried to lower the program with ```mlir-opt --tensor-bufferize --buffer-deallocation --convert-scf-to-cf --convert-cf-to-llvm --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts test.mlir```, and executed the executable file, I got inconsistent results over multiple runs. I noticed that after using the passes ```--tensor-bufferize --buffer-deallocation```, the program was lowered to: ``` module { func.func @func0(%arg0: tensor<19xi32>) { %0 = bufferization.to_memref %arg0 : memref<19xi32> %c2 = arith.constant 2 : index %1 = memref.load %0[%c2] : memref<19xi32> vector.print %1 : i32 return } func.func private @func1() { %c10_i32 = arith.constant 10 : i32 %alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32> %c0 = arith.constant 0 : index memref.store %c10_i32, %alloc[%c0] : memref<1xi32> memref.dealloc %alloc : memref<1xi32> %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<19xi32> %c0_1 = arith.constant 0 : index %c1 = arith.constant 1 : index %c2 = arith.constant 2 : index %c3 = arith.constant 3 : index %c4 = arith.constant 4 : index %c5 = arith.constant 5 : index %c6 = arith.constant 6 : index %c7 = arith.constant 7 : index %c8 = arith.constant 8 : index %c9 = arith.constant 9 : index %c10 = arith.constant 10 : index %c11 = arith.constant 11 : index %c12 = arith.constant 12 : index %c13 = arith.constant 13 : index %c14 = arith.constant 14 : index %c15 = arith.constant 15 : index %c16 = arith.constant 16 : index %c17 = arith.constant 17 : index %c18 = arith.constant 18 : index memref.store %c10_i32, %alloc_0[%c0_1] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c1] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c2] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c3] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c4] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c5] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c6] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c7] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c8] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c9] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c10] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c11] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c12] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c13] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c14] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c15] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c16] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c17] : memref<19xi32> memref.store %c10_i32, %alloc_0[%c18] : memref<19xi32> %0 = bufferization.to_tensor %alloc_0 : memref<19xi32> memref.dealloc %alloc_0 : memref<19xi32> call @func0(%0) : (tensor<19xi32>) -> () return } } ``` The memory of ```%alloc_0``` seems to be deallocated before the related tensor ```%0``` was used as an attribute in the function call. I also tried to use the pass ```--buffer-deallocation-simplification``` after ```--buffer-deallocation```, but it seems couldn't help with this case. I'm not sure if there is any bug in my program or the wrong usage of ```--buffer-deallocation``` and ```--buffer-deallocation-simplification``` that caused this error. My git version is 4c79d38f82e1f6fe8575d88d8c74f2f1806b19ce.

wangyongj1a commented 1 week ago

I made a small change where I cast the result of function call to f32 type, and made the function return the casted value to satisfy mlir-cpu-runner. test.mlir:

module {
  func.func @func0(%arg0: tensor<19xi32>) -> i32{
    %0 = arith.constant 2 : index
    %1 = tensor.extract %arg0[%0] : tensor<19xi32>
    vector.print %1 : i32
    return %1 : i32
  }
  func.func private @func1() -> f32 {
    %0 = arith.constant 10 : i32
    %1 = tensor.from_elements %0 : tensor<1xi32>
    %2 = tensor.from_elements %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0, %0 : tensor<19xi32>
    %3 = call @func0(%2) : (tensor<19xi32>) -> i32
    %4 = arith.sitofp %3 : i32 to f32
    return %4 : f32
  }
}

When I ran /data/tmp/v1029/llvm-project/build/bin/mlir-opt --one-shot-bufferize=dialect-filter=tensor,bufferization --convert-scf-to-cf --convert-cf-to-llvm --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --one-shot-bufferize=dialect-filter=tensor --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/v1029/llvm-project/build/bin/mlir-cpu-runner -e func1 --shared-libs=/data/tmp/v1029/llvm-project/build/lib/libmlir_runner_utils.so,/data/tmp/v1029/llvm-project/build/lib/libmlir_c_runner_utils.so on the program, I got the result of:

10
1.000000e+01

However, when I ran /data/tmp/v1029/llvm-project/build/bin/mlir-opt --one-shot-bufferize=dialect-filter=tensor,bufferization --buffer-deallocation --convert-scf-to-cf --convert-cf-to-llvm --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --one-shot-bufferize=dialect-filter=tensor --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts test.mlir | /data/tmp/v1029/llvm-project/build/bin/mlir-cpu-runner -e func1 --shared-libs=/data/tmp/v1029/llvm-project/build/lib/libmlir_runner_utils.so,/data/tmp/v1029/llvm-project/build/lib/libmlir_c_runner_utils.so on the program, I got inconsistent results over multiple runs.

This problem seems to still exist. I'm not sure if there is any bug in my program or if the wrong usage of the above passes caused these results. My git version is e19a5fc6d306a81d181a9597a8b25c444c08d722.

IanWood1 commented 1 week ago

I think the problem originates from this series of passes--one-shot-bufferize=dialect-filter=tensor,bufferization --buffer-deallocation --func-bufferize. This results in (full IR here:

  %0 = bufferization.to_tensor %alloc_0 : memref<19xi32>
  %1 = bufferization.to_memref %0 : memref<19xi32>
  memref.dealloc %alloc_0 : memref<19xi32>
  %2 = call @func0(%1) : (memref<19xi32>) -> i32

which gets folded to (full IR here):

 memref.dealloc %alloc : memref<19xi32>
%0 = call @func0(%alloc) : (memref<19xi32>) -> i32

I think it might be problematic to run deallocation before bufferizing all ops. But It could also be that to_tensor -> to_memref can't be folded (instead replaced with an allocation and copy). But I'm not sure, hopefully this helps provide some context to others can help

llvm / llvm-project

Inconsistent results when using --buffer-deallocation #92056