Closed ChuanqiXu9 closed 3 weeks ago
The term "region" refers to exactly the same thing as it does in MLIR. Regions model syntactical scopes and structured control flow. I believe you're right that the current design of cir.goto
/ cir.label
is flawed if anyone tries to use them to model jumps across scopes (i.e. non-structured control flow). Let's consider the following simple case:
int test(int x) {
if (x)
goto label;
{
x = 10;
label:
return x;
}
}
We get the following ClangIR output:
cir.func @_Z4testi(%arg0: !s32i) -> !s32i {
%0 = cir.alloca !s32i, !cir.ptr<!s32i>, ["x", init] {alignment = 4 : i64}
%1 = cir.alloca !s32i, !cir.ptr<!s32i>, ["__retval"] {alignment = 4 : i64}
cir.store %arg0, %0 : !s32i, !cir.ptr<!s32i>
cir.scope { // REGION #1
%2 = cir.load %0 : !cir.ptr<!s32i>, !s32i
%3 = cir.cast(int_to_bool, %2 : !s32i), !cir.bool
cir.if %3 {
cir.goto "label"
}
}
cir.scope { // REGION #2
%2 = cir.const #cir.int<10> : !s32i
cir.store %2, %0 : !s32i, !cir.ptr<!s32i>
cir.br ^bb1
^bb1: // pred: ^bb0
cir.label "label"
%3 = cir.load %0 : !cir.ptr<!s32i>, !s32i
cir.store %3, %1 : !s32i, !cir.ptr<!s32i>
%4 = cir.load %1 : !cir.ptr<!s32i>, !s32i
cir.return %4 : !s32i
}
cir.unreachable
}
The cir.goto
operation there actually jumps to a different region in its middle, which contradicts with the fundamental assumption that each region starts execution from its entry block [1]. This violation may lead to false analysis and mis-optimizations. For example, the dominance analysis could incorrectly conclude that x=10
dominates return x
, and optimizations would then theoretically be able to optimize the code into return 10
.
[1] https://mlir.llvm.org/docs/LangRef/#control-flow-and-ssacfg-regions
However, when control flow enters a region, it always begins in the first block of the region, called the entry block.
The term "region" refers to exactly the same thing as it does in MLIR. Regions model syntactical scopes and structured control flow. I believe you're right that the current design of
cir.goto
/cir.label
is flawed if anyone tries to use them to model jumps across scopes (i.e. non-structured control flow). Let's consider the following simple case:int test(int x) { if (x) goto label; { x = 10; label: return x; } }
We get the following ClangIR output:
cir.func @_Z4testi(%arg0: !s32i) -> !s32i { %0 = cir.alloca !s32i, !cir.ptr<!s32i>, ["x", init] {alignment = 4 : i64} %1 = cir.alloca !s32i, !cir.ptr<!s32i>, ["__retval"] {alignment = 4 : i64} cir.store %arg0, %0 : !s32i, !cir.ptr<!s32i> cir.scope { // REGION #1 %2 = cir.load %0 : !cir.ptr<!s32i>, !s32i %3 = cir.cast(int_to_bool, %2 : !s32i), !cir.bool cir.if %3 { cir.goto "label" } } cir.scope { // REGION #2 %2 = cir.const #cir.int<10> : !s32i cir.store %2, %0 : !s32i, !cir.ptr<!s32i> cir.br ^bb1 ^bb1: // pred: ^bb0 cir.label "label" %3 = cir.load %0 : !cir.ptr<!s32i>, !s32i cir.store %3, %1 : !s32i, !cir.ptr<!s32i> %4 = cir.load %1 : !cir.ptr<!s32i>, !s32i cir.return %4 : !s32i } cir.unreachable }
The
cir.goto
operation there actually jumps to a different region in its middle, which contradicts with the fundamental assumption that each region starts execution from its entry block [1]. This violation may lead to false analysis and mis-optimizations. For example, the dominance analysis could incorrectly conclude thatx=10
dominatesreturn x
, and optimizations would then theoretically be able to optimize the code intoreturn 10
.[1] https://mlir.llvm.org/docs/LangRef/#control-flow-and-ssacfg-regions
However, when control flow enters a region, it always begins in the first block of the region, called the entry block.
Thanks. And do we have plans or ideas to fix such problems? Or do we have to continue by ignoring the use of goto
?
And do we have plans or ideas to fix such problems?
Nope.
do we have to continue by ignoring the use of goto?
It works now because we don't have much analysis and optimizations on the structured CIR yet, and the problem does not exist anymore after we flatten the CIR. But once we start incorporating more analysis and optimizations on the non-flat CIR the current design may introduce troubles.
Given the fact we have to support goto, maybe the simplest workaround we can do may be: add a hasGoto
(or may-broken-cfg) attribute to the FuncOp, then the passes on structured CIR can skip such functions. Although we can it finer grained (by providing APIs to detect goto when computing dominances), that may be harder and not so efficiency.
Another approach: we run a special "goto-flattening" function pass immediately after CIRGen. For each cir.goto
/ cir.label
pair, the pass finds the innermost region that contains both of the cir.goto
and the cir.label
operations, and then flattens that region, and replaces cir.goto
and cir.label
with jumps between basic blocks accordingly.
Let me clarify a few observations:
goto
is within the same scope (intrascope), we currently emit a cir.br
for it.goto
happen across scope, we emit it's "symbolic" version, which is the cir.goto "label"
we are talking about. This is by design, since we want to preserve structural control flow at this point. The Goto Solver pass then finally does the right patching. The documentation should state this better though, it's not obvious what the expectations are.cir.goto label
things, or communicate invalidation back when those operations are found. cir.break
and other ops have similar issues. Ideally we should have an interface to these operations, called IllegalBranchSource
, and passes can check for those in more generic ways (cc @sitio-couto and I talked about this at some point, in case he has more insights).Ok, tried to clarify a bit more: https://github.com/llvm/clangir/commit/0891e6f00c9dd1d389e7ca1c4a3cf0a3a7bf656f
Also note that
This is by design, since we want to preserve structural control flow at this point.
comes with a tradeoff, e.g. you cannot do accurate modeling of all basic blocks interaction. But if you need an analysis that requires that, it will have to be run late in the pipeline (once flat CIR gives full visibility and completely models the correct final control-flow).
Hey @ChuanqiXu9, I think I can provide a bit more insight here.
CIR doesn't fully adhere to MLIR regions' rules in some cases. MLIR's SSACFG regions (which CIR uses) are Single-Entry-Multiple-Exit (SEME). This term can be misleading because it refers to multiple exit statements but not multiple exit destinations; a branching operation can only transfer control flow to the parent operation or a basic block within the same region.
However, CIR regions don't strictly follow these rules. Operations like cir.break
, cir.continue
, cir.goto
, and cir.return
often break them. For example, a cir.return
within a cir.if
yields control flow not to its parent operation but all the way up to a cir.func
. Because of this, I somewhat agree with your second interpretation: the semantics of 'region' in CIR partially differs from MLIR's.
MLIR region's rules make control flow predictable and easy for the compiler to navigate, enabling generic control-flow passes that work on any dialect. So why doesn't CIR follow these rules? There are several reasons. Code generation would become more difficult, diverging from the original and requiring significant bookkeeping. Creating operations with regions that strictly follow these rules would make them too complex and harder to read. Abandoning regions altogether conflicts with CIR's main goal of providing a high-level IR for C/C++ and brushes aside one of MLIR's best abstractions. We don't currently need MLIR's generic passes, so supporting them isn't a priority; when we do, we can transform the IR later in the pipeline to conform to these rules (e.g., the flattening pass). And, most importantly, there is an ongoing work in MLIR to add control-flow support for these kinds of operations.
Regarding goto
s, they present a similar problem but are also in a league of their own since they may branch to wherever they please. The ongoing work I mentioned above does not include goto
s. Given MLIR's design to facilitate compiler reasoning about control flow, I doubt there will ever be built-in goto
support. Instead, we need to work around this while leveraging MLIR's infrastructure as much as possible, which is what the implementation @bcardosolopes described does.
As for your switch-case implementation, keep in mind that CIR's main goal is to be a high-level IR for C/C++, and goto
s are rare (I hope). We should try to preserve useful high-level information and leverage MLIR's high-level abstractions as much as possible, which is why I recommend going for a region-based implementation.
@bcardosolopes @sitio-couto thanks for the great answers! They do help! I think I can close the issue.
I know the concept of 'region' comes from MLIR directly. And I am developing https://github.com/llvm/clangir/issues/522 and the concept of 'region' in CIR makes me confusing about how to handle cases. Given this is not only limited to switch, I tried to open a new issue for this.
In my mind, a region is a group of blocks, where these blocks has a single entry and a single exit. And I am not sure if it is still true in CIR due to the existence of
goto
. I didn't see our handling ofgoto
changes any region information. So if I didn't misunderstand,goto
breaks the general semantics of 'region'. And I hope to get some input on this topic.I feel the possible answers may be:
goto
some day. iii.goto
is the devil and we don't know how to fix it. We will get what we get.goto
?CC: @Lancern @wenpen @gitoleg @bcardosolopes