Inconsistent result with --sparsification-and-bufferization and tensor.empty

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

Other

27.03k stars 11.07k forks source link

Consider the following MLIR program: a.mlir:

module {
  func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 {
    %idx0 = index.constant 0
    %0 = tensor.extract %arg0[%idx0] : tensor<1xi32>
    return %0 : i32
  }
  func.func @func1() {
    %c1_i32 = arith.constant 1 : i32
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %5 = tensor.empty() : tensor<1xi32> // using empty
    // %5 = tensor.from_elements %c0_i32 : tensor<1xi32>

    %inserted_28 = tensor.insert %c1_i32 into %5[%c0] : tensor<1xi32>
    %31 = call @tensor_i32(%inserted_28) : (tensor<1xi32>) -> i32
    %308 = tensor.extract %5[%c0] : tensor<1xi32>
    // vector.print %31 : i32
    vector.print %308 : i32
    return
  }
}

It will output two different results when applying two different optimization pass sequences: pass sequence1: --sparsification-and-bufferization --tensor-bufferize --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts pass sequence2: --tensor-bufferize --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts

The pass sequence1 outputs the executable that outputs 1, while the latter outputs 0. The difference between pass sequence1 and pass sequence2 is that there is an additional --sparsification-and-bufferization at the begining of the pass sequence1.

I futher analyze the output of these two sequences: pass1: --sparsification-and-bufferization --tensor-bufferize pass2: --tensor-bufferize The result of pass1 is:

module {
  func.func @tensor_i32(%arg0: memref<1xi32>) -> i32 {
    %idx0 = index.constant 0
    %0 = memref.load %arg0[%idx0] : memref<1xi32>
    return %0 : i32
  }
  func.func @func1() {
    %c1_i32 = arith.constant 1 : i32
    %c0 = arith.constant 0 : index
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
    memref.store %c1_i32, %alloc[%c0] : memref<1xi32>
    %0 = call @tensor_i32(%alloc) : (memref<1xi32>) -> i32
    %1 = memref.load %alloc[%c0] : memref<1xi32>
    vector.print %1 : i32
    return
  }
}

The result of pass2 is:

module {
  func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 {
    %0 = bufferization.to_memref %arg0 : memref<1xi32>
    %idx0 = index.constant 0
    %1 = memref.load %0[%idx0] : memref<1xi32>
    return %1 : i32
  }
  func.func @func1() {
    %c1_i32 = arith.constant 1 : i32
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
    %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<1xi32>
    memref.copy %alloc, %alloc_0 : memref<1xi32> to memref<1xi32>
    memref.store %c1_i32, %alloc_0[%c0] : memref<1xi32>
    %0 = bufferization.to_tensor %alloc_0 : memref<1xi32>
    %1 = call @tensor_i32(%0) : (tensor<1xi32>) -> i32
    %2 = memref.load %alloc[%c0] : memref<1xi32>
    vector.print %2 : i32
    return
  }
}

It seems that --sparsification-and-bufferization --tensor-bufferize treats the operand and the result of tensor.insert as same tensor(memref), when the operand of tensor.insert is created by tensor.empty.

If I replace the tensor.empty with tensor.from_element, or just wrap the tensor.empty with a function. The modified MLIR program will output the same result. The modified program:

module {
  func.func @gen_tensor_i32() -> tensor<1xi32> {
    %c0_i32 = arith.constant 0 : i32
    %5 = tensor.empty() : tensor<1xi32>
    return %5 : tensor<1xi32>
  }
  func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 {
    %idx0 = index.constant 0
    %0 = tensor.extract %arg0[%idx0] : tensor<1xi32>
    return %0 : i32
  }
  func.func @func1() {
    %c1_i32 = arith.constant 1 : i32
    %c0_i32 = arith.constant 0 : i32
    %c0 = arith.constant 0 : index
    %5 = call @gen_tensor_i32() : () -> tensor<1xi32>
    // %5 = tensor.empty() : tensor<1xi32> // using empty
    // %5 = tensor.from_elements %c0_i32 : tensor<1xi32>

    %inserted_28 = tensor.insert %c1_i32 into %5[%c0] : tensor<1xi32>
    %31 = call @tensor_i32(%inserted_28) : (tensor<1xi32>) -> i32
    %308 = tensor.extract %5[%c0] : tensor<1xi32>
    // vector.print %31 : i32
    vector.print %308 : i32
    return
  }
}

I wonder if there is some thing wrong with --sparsification-and-bufferization and tensor.empty. This result inconsistency may not be a problem because tensor.empty should only contains the shpae information.

git version: 2163ae761808ca0e5478357384f6ddbacce279eb

@llvm/issue-subscribers-mlir

Author: anonymous (Anonymous15592)

Consider the following MLIR program: a.mlir: ``` module { func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 { %idx0 = index.constant 0 %0 = tensor.extract %arg0[%idx0] : tensor<1xi32> return %0 : i32 } func.func @func1() { %c1_i32 = arith.constant 1 : i32 %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %5 = tensor.empty() : tensor<1xi32> // using empty // %5 = tensor.from_elements %c0_i32 : tensor<1xi32> %inserted_28 = tensor.insert %c1_i32 into %5[%c0] : tensor<1xi32> %31 = call @tensor_i32(%inserted_28) : (tensor<1xi32>) -> i32 %308 = tensor.extract %5[%c0] : tensor<1xi32> // vector.print %31 : i32 vector.print %308 : i32 return } } ``` It will output two different results when applying two different optimization pass sequences: ```pass sequence1```: ```--sparsification-and-bufferization --tensor-bufferize --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts``` ```pass sequence2```: ```--tensor-bufferize --func-bufferize --convert-func-to-llvm --convert-index-to-llvm --convert-vector-to-llvm --finalize-memref-to-llvm --convert-arith-to-llvm --reconcile-unrealized-casts``` The ```pass sequence1``` outputs the executable that outputs 1, while the latter outputs 0. The difference between ```pass sequence1``` and ```pass sequence2``` is that there is an additional ```--sparsification-and-bufferization``` at the begining of the ```pass sequence1```. I futher analyze the output of these two sequences: pass1: ```--sparsification-and-bufferization --tensor-bufferize``` pass2: ```--tensor-bufferize``` The result of ```pass1``` is: ``` module { func.func @tensor_i32(%arg0: memref<1xi32>) -> i32 { %idx0 = index.constant 0 %0 = memref.load %arg0[%idx0] : memref<1xi32> return %0 : i32 } func.func @func1() { %c1_i32 = arith.constant 1 : i32 %c0 = arith.constant 0 : index %alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32> memref.store %c1_i32, %alloc[%c0] : memref<1xi32> %0 = call @tensor_i32(%alloc) : (memref<1xi32>) -> i32 %1 = memref.load %alloc[%c0] : memref<1xi32> vector.print %1 : i32 return } } ``` The result of ```pass2``` is: ``` module { func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 { %0 = bufferization.to_memref %arg0 : memref<1xi32> %idx0 = index.constant 0 %1 = memref.load %0[%idx0] : memref<1xi32> return %1 : i32 } func.func @func1() { %c1_i32 = arith.constant 1 : i32 %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %alloc = memref.alloc() {alignment = 64 : i64} : memref<1xi32> %alloc_0 = memref.alloc() {alignment = 64 : i64} : memref<1xi32> memref.copy %alloc, %alloc_0 : memref<1xi32> to memref<1xi32> memref.store %c1_i32, %alloc_0[%c0] : memref<1xi32> %0 = bufferization.to_tensor %alloc_0 : memref<1xi32> %1 = call @tensor_i32(%0) : (tensor<1xi32>) -> i32 %2 = memref.load %alloc[%c0] : memref<1xi32> vector.print %2 : i32 return } } ``` It seems that ```--sparsification-and-bufferization --tensor-bufferize``` treats the operand and the result of ```tensor.insert``` as same tensor(memref), when the operand of ```tensor.insert``` is created by ```tensor.empty```. If I replace the ```tensor.empty``` with ```tensor.from_element```, or just wrap the ```tensor.empty``` with a function. The modified MLIR program will output the same result. The modified program: ``` module { func.func @gen_tensor_i32() -> tensor<1xi32> { %c0_i32 = arith.constant 0 : i32 %5 = tensor.empty() : tensor<1xi32> return %5 : tensor<1xi32> } func.func @tensor_i32(%arg0: tensor<1xi32>) -> i32 { %idx0 = index.constant 0 %0 = tensor.extract %arg0[%idx0] : tensor<1xi32> return %0 : i32 } func.func @func1() { %c1_i32 = arith.constant 1 : i32 %c0_i32 = arith.constant 0 : i32 %c0 = arith.constant 0 : index %5 = call @gen_tensor_i32() : () -> tensor<1xi32> // %5 = tensor.empty() : tensor<1xi32> // using empty // %5 = tensor.from_elements %c0_i32 : tensor<1xi32> %inserted_28 = tensor.insert %c1_i32 into %5[%c0] : tensor<1xi32> %31 = call @tensor_i32(%inserted_28) : (tensor<1xi32>) -> i32 %308 = tensor.extract %5[%c0] : tensor<1xi32> // vector.print %31 : i32 vector.print %308 : i32 return } } ``` I wonder if there is some thing wrong with ```--sparsification-and-bufferization``` and ```tensor.empty```. This result inconsistency may not be a problem because ```tensor.empty``` should only contains the shpae information. git version: 2163ae761808ca0e5478357384f6ddbacce279eb

llvm / llvm-project

Inconsistent result with --sparsification-and-bufferization and tensor.empty #92069