llvm / clangir

A new (MLIR based) high-level IR for clang.
https://clangir.org
Other
308 stars 84 forks source link

Case gnu range need to be improved #632

Closed wenpen closed 1 month ago

wenpen commented 1 month ago

Pr #599 added support for gnu range in case statement, but the implementation need to store every integer inside the range.

For example, the following code will produce tons of repeated

  switch(x) {
    case 50 ... 233:
      return 1;
  }
switch i32 %5, label %8 [
    i32 50, label %6
    i32 51, label %6
    i32 52, label %6
    ...
    ...
    i32 231, label %6
    i32 232, label %6
    i32 233, label %6
  ], !dbg !12

While the OG codegen is smart

  switch i32 %0, label %sw.caserange [
  ]

sw.caserange:                                     ; preds = %entry
  %1 = sub i32 %0, 50
  %inbounds = icmp ule i32 %1, 183
  br i1 %inbounds, label %sw.bb, label %sw.epilog
wenpen commented 1 month ago

Hi @bcardosolopes, I'm working on this issue as we talked before, I felt it's not very trivial, so opened this issue to discuss it.

IIUC, the two different codegen pipelines are as below. The original codegen workflow is: SwitchStmt => SwitchInst+BranchInst The cir workflow is: SwitchStmt => cir::SwitchOp => cir::SwitchFlatOp => LLVM::SwitchOp => SwitchInst

So the problem is cir lack the BranchInst information, in my view there are two proposals to solve it

  1. Add attributes in SwitchOp to record the range information, and lower it to BranchInst. Then we need to modify all the three ops, i.e. cir::SwitchOp, cir::SwitchFlatOp and LLVM::SwitchOp.
  2. In CIRSwitchOpFlattening, replace SwitchOp with BrCondOp, just like what CIRLoopOpInterfaceFlattening did. I prefer this solution, as it would be helpful to #522

Do you have some suggestions? I'll start the work if you confirm the solution, thanks!

Lancern commented 1 month ago

In CIRSwitchOpFlattening, replace SwitchOp with BrCondOp, just like what CIRLoopOpInterfaceFlattening did.

Consider this case:

int test(int x) {
  switch(x) {
  case 1000 ... 2000:
    return 1;

  case 3333:
    return 3;
  case 4444:
    return 4;
  case 5555:
    return 5;

  default:
    return 6;
  }
}

The original CodeGen generates:

  switch i32 %0, label %sw.caserange [
    i32 3333, label %sw.bb1
    i32 4444, label %sw.bb2
    i32 5555, label %sw.bb3
  ]

sw.caserange:
  %1 = sub i32 %0, 1000
  %inbounds = icmp ule i32 %1, 1000
  br i1 %inbounds, label %sw.bb, label %sw.default

So to keep the same output code you cannot just replace SwitchOp with BrCondOp. The above code is more or less likely to be transformed into:

int test(int x) {
  switch(x) {
  case 3333:
    return 3;
  case 4444:
    return 4;
  case 5555:
    return 5;

  default:
    if (x >= 1000 && x <= 2000)
      return 1;
    return 6;
  }
}
wenpen commented 1 month ago

@Lancern Your suggestion is great! I'll try this implementation, thank you.

bcardosolopes commented 1 month ago
  1. Add attributes in SwitchOp to record the range information

We need a new switch case that tracks ranges.

... and lower it to BranchInst. Then we need to modify all the three ops, i.e. cir::SwitchOp, cir::SwitchFlatOp and LLVM::SwitchOp.

Changing LLVM::SwitchOp is out of scope.

  1. In CIRSwitchOpFlattening, replace SwitchOp with BrCondOp, just like what CIRLoopOpInterfaceFlattening did. I prefer this solution, as it would be helpful to Assertion failure on switch statement with case label in nested block #522

Having the new range attribute is decoupled from the actual lowering strategy, as @Lancern pointed out - SwitchOp might needed to be lowered to a mix of flat switch and brcond's. I suggest you put a debugger in LLVM traditional codegen and learn how it applies the heuristics, we should do similar in CIR, but likely as part of CIRSwitchOpFlattening, as you noticed.