llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.79k stars 11.44k forks source link

[mlir] Using `convert-scf-to-cf` and `test-print-liveness` together cause segment fault #60909

Open Jokeren opened 1 year ago

Jokeren commented 1 year ago

The problem only appears if we use a single command line on the IR below. Instead, if we first do convert-scf-to-cf, copy the output IR to a file, and then apply test-print-liveness on the file, we don't observe the problem. (Please bear with me on the weird variable names; it's modified from triton IR)

We found one possible explanation. Maybe isBeforeInBlock should add an edge case for other == this because updateOrderIfNecessary is expected to only be invoked on blocks with more than one. After skipping updateOrderIfNecessary when other == this, we did solve the problem but it doesn't explain why separating convert-scf-to-cf and test-print-liveness could avoid the problem.

https://github.com/llvm/llvm-project/blob/fd5d92e6220905f7d942a81108266d427babe143/mlir/lib/IR/Operation.cpp#L274 https://github.com/llvm/llvm-project/blob/fd5d92e6220905f7d942a81108266d427babe143/mlir/lib/IR/Operation.cpp#L304

cc @ptillet

Tested with f50cad2c55dfab4ff3da010a64119b4a93522289

Reproduced with:

mlir-opt -pass-pipeline="builtin.module(func.func(convert-scf-to-cf), func.func(test-print-liveness))" ./test.mlir
func.func @for_if_for(%lb : index, %ub : index, %step : index, %i1 : i1) {
 %a_shared_init = arith.constant dense<0.00e+00> : tensor<128x32xf16>
 %b_shared_init = arith.constant dense<0.00e+00> : tensor<128x32xf16>
 %c_shared_init = arith.constant dense<0.00e+00> : tensor<128x32xf16>
 %c_blocked = arith.negf %c_shared_init : tensor<128x32xf16>

 %a_shared, %b_shared, %c_shared = scf.for %iv = %lb to %ub step %step iter_args(%a_shared = %a_shared_init, %b_shared = %b_shared_init, %c_shared = %c_shared_init) -> (tensor<128x32xf16>, tensor<128x32xf16>, tensor<128x32xf16>) {
  %c_shared_next_next = scf.if %i1 -> tensor<128x32xf16> {
   %cst0 = arith.constant dense<0.00e+00> : tensor<128x32xf16>
   scf.yield %cst0 : tensor<128x32xf16>
  } else {
   %c_shared_ = scf.for %jv = %lb to %ub step %step iter_args(%c_shared_next = %c_shared) -> (tensor<128x32xf16>) {
    %c_blocked_next = arith.negf %c_shared_next : tensor<128x32xf16>
    scf.yield %c_shared_next : tensor<128x32xf16>
   }
   scf.yield %c_shared_ : tensor<128x32xf16>
  }
  %b_blocked_next = arith.negf %b_shared: tensor<128x32xf16>
  scf.yield %a_shared, %b_shared, %c_shared_next_next : tensor<128x32xf16>, tensor<128x32xf16>, tensor<128x32xf16>
 }
 return
}
llvmbot commented 1 year ago

@llvm/issue-subscribers-mlir

wpmed92 commented 1 year ago

@Jokeren I tested it, and logged out wether the op order is valid in isBeforeInBlock, and what I found is the following:

SUCCESS case (pipeline run in separate steps): // - Block: 4 // --- LiveIn: arg0@0 arg1@0 arg2@0 arg3@0 arg0@1 arg1@1 arg2@1 arg3@1 // --- LiveOut: arg0@0 arg1@0 arg2@0 arg3@0 arg0@1 arg1@1 arg2@1 // --- BeginLivenessIntervals // --- EndLivenessIntervals // --- BeginCurrentlyLive Current op: cf.br START of live range op: cf.br END of live range op: cf.br op order is INVALD in block ...

-> In Operation::isBeforeInBlock the op order is invalid in this case, so it hits the recomputeOpOrder code path, so the code won't hit the else part where updateOrderIfNecessary is called, and so it does not crash.

FAILURE case (full pipeline run): // - Block: 4 // --- LiveIn: arg0@0 arg1@0 arg2@0 arg3@0 arg0@1 arg1@1 arg2@1 arg3@1 // --- LiveOut: arg0@0 arg1@0 arg2@0 arg3@0 arg0@1 arg1@1 arg2@1 // --- BeginLivenessIntervals // --- EndLivenessIntervals // --- BeginCurrentlyLive Current op: cf.br START of live range op: cf.br END of live range op: cf.br

-> Crashes, because op order is valid, so we hit the else part, where updateOrderIfNecessary happens.

I will now look into why is there a difference in op order validity between the two runs.