[CIR][CodeGen] Goto pass

gitoleg commented 3 months ago

This PR: 1) adds new operations: GotoOp and LabelOp and inserts them in the codegen 2) adds a pass that replaces goto operations with branches to the corresponded blocks (and erases LabelOp from CIR) 3) adds an attribute for FuncOp to track which functions should be processed by the pass 4) adds several tests for goto-s

gitoleg commented 3 months ago

@bcardosolopes Bunch of questions from my side.
As far I understand, Symbol trait requires us to add SymbolTable trait to the parent operation: error: 'cir.label' op symbol's parent must have the SymbolTable trait, which can be cir.func, cir.if, cir.switch ... I tried to add SymbolTable for cir.func and faced with the next error: error: 'cir.func' op Operations with a 'SymbolTable' must have exactly one block

So, Am I doing something wrong?

Speaking about cir.goto verification. We could add an array of string attributes for cir.func and verify that the given label for cir.goto is in there. Or what else we could do? I suspect we don't want to search for the label in a function during the verification stage.

bcardosolopes commented 3 months ago

So, Am I doing something wrong?

Tricky, I don't think you are doing anything wrong, it's just the way it works, looks like =(

Speaking about cir.goto verification. We could add an array of string attributes for cir.func and verify that the given label for cir.goto is in there. Or what else we could do? I suspect we don't want to search for the label in a function during the verification stage.

Sounds like a good idea. Translating in practical terms:

Have a StringRef attribute list in cir.func.
Both cir.goto and cir.label have one attribute, which is a mlir::IntAttr index into that list.

This means the source of truth becomes the list (not allowing it to be innaccurate with the rest of usage) and make verifiers pretty straightforward:

cir.func: the list is all elements StringRef.
cir.goto/cir.label: the index must be less than the size of the list.

Wdyt?

bcardosolopes commented 3 months ago

Oh, and we should probably mark goto and label ops to have side-effects, so they don't get removed in the future when we start using the canonicalizer.

gitoleg commented 3 months ago

@bcardosolopes

cir.func: the list is all elements StringRef. cir.goto/cir.label: the index must be less than the size of the list.

It's doable! But I would preserve label names in the function body, it's more readable from my point of view. Well, custom printing will help us here, but I'm afraid we'll have a problem with parsing: no way to translate string into index - we need to get a parent operation somehow for the operation we're currently trying to create.

Of course it's up to you :) just let me know how do you see it after all.

Oh, and we should probably mark goto and label ops to have side-effects, so they don't get removed in the future when we start using the canonicalizer.

May I ask for a hint? ) Tried several traits with no luck. BTW, current goto lowering tests (not mine ones) use the canonicalize option - so we either need to fix it or add a trait

bcardosolopes commented 3 months ago

It's doable! But I would preserve label names in the function body, it's more readable from my point of view. Well, custom printing will help us here, but I'm afraid we'll have a problem with parsing: no way to translate string into index - we need to get a parent operation somehow for the operation we're currently trying to create.

Definitely more readable. But if we only use the index, we don't have the parsing problem and we don't even need custom print/parsing: the verifier is the one checking if the "access" makes sense, and at that point one can get the FuncOp to look into the list. (unless I'm missing something?)

Of course it's up to you :) just let me know how do you see it after all.

I'm open, and I'm usually "all for" more readable. One possible (and nice) approach would be to introduce two new traits: Label and LabelUse, and use that to abstract our StringRefs, just like Symbol does. It's could lead to a nice verifier and the GotoSolver can operate on top of those interfaces instead of looking into operations and StringRefs. Thoughts? One additional idea is to look into LLVM IR dialect and see how they make sure some of the ops aren't removed.

May I ask for a hint? ) Tried several traits with no luck. BTW, current goto lowering tests (not mine ones) use the canonicalize option - so we either need to fix it or add a trait

Which ones did you try? My suggestions would be to see users of mlir/include/mlir/Interfaces/SideEffectInterfaceBase.td (more in mlir/test/lib/Dialect/Test/TestInterfaces.td) and either use something already defined or define some custom thing on top of those base classes for CIR. If we implement Label and LabelUse perhaps we could do those in terms of SideEffect or SideEffectsTraitBase.

Actually, we had a discussion about this in todays meeting, one good idea we got from @orbiri was to always add tests for new operations to make sure the canonicalizer won't remove them (similar to what we do with invalid.cir for parsing errors) - at least for new operations added we'll make sure that those aren't easily deletable. I think we should do that and start with both cir.label and cir.goto.

bcardosolopes commented 3 months ago

Btw, @lanza rebased over the weekend, sorry for the churn. Can you rebase this PR again?

gitoleg commented 3 months ago

@bcardosolopes

while it still in progress (I tried a solution with indexes and didn't fix tests so far) we have something to discuss.

speaking about Canonicalizer pass, the AlwaysSpeculatable for labels does the job - I'm not sure though it's a good solution.
Note, that looks like the pass is so greedy that removes unreachable blocks - meaning once the block is reachable only through goto - it will be removed in our case :(

One additional idea is to look into LLVM IR dialect and see how they make sure some of the ops aren't removed.

The problem is that LabelOp doesn't have a result so it will never used later. And doesn't have memory access to tag op with MemoryEffect or something - so it's still a question how to handle the LabelOp properly. Maybe we could try to find a workaround here, e.g. use hasCanonicalizer = 1 and set an empty set of patterns for the LabelOp.

I'm open, and I'm usually "all for" more readable.

I have one more idea. What if we will use strings in label/goto operations and have a list of strings as FuncOp attribute? in this case we'll preserve the readability and will be able to verify easily - e.g. if the label name is in the parent functions list of labels, not just an index is in bounds.

One possible (and nice) approach would be to introduce two new traits: Label and LabelUse

I will try to take a look at, but at the first glance it looks it will complicate all of this, or it's a wrong impression? Like Symbol trait and SymbolUser requires SymbolTable, and Label and LabelUse will require something similar. Though m aybe I still don't understand your idea.

So what do you think? I would do it in a more simple way, but if it's a wrong approach - let's try to go with the one you've suggested.

bcardosolopes commented 3 months ago

speaking about Canonicalizer pass, the AlwaysSpeculatable for labels does the job - I'm not sure though it's a good solution. Note, that looks like the pass is so greedy that removes unreachable blocks - meaning once the block is reachable only through goto - it will be removed in our case :(

Oh, that's really aggressive. Not sure I have a solution for that just yet, please add this information in one the existing issues so we can keep track of it. It also might be worth posting on discourse/MLIR and ask around if that's a known issue or if there's any trick we could do.

Btw, I remember that some passes have a configuration struct you can pass in that diminishes the aggressiviness, have you look into that for the canonicalizer? Perhaps there's a mode that isn't that aggressive with unrecheable blocks.

The problem is that LabelOp doesn't have a result so it will never used later. And doesn't have memory access to tag op with MemoryEffect or something - so it's still a question how to handle the LabelOp properly. Maybe we could try to find a workaround here, e.g. use hasCanonicalizer = 1 and set an empty set of patterns for the LabelOp.

I'd be fine with marking it with memory effect or something else that holds it down until we find a proper solution. We just need to make sure to document this enough so at some point we can work with MLIR upstream for a proper solution.

I have one more idea. What if we will use strings in label/goto operations and have a list of strings in the function attribute? in this case we'll preserve the readability and will able to verify easily - e.g. if the label name is in the parent functions list of labels, not just an index is in bounds.

I'm a bit unconfortable to introduce the same string 3 times, seems a bit odd - 2 times is probably enough (if we have to do it in label and goto).

I will try to take a look at, but at the first glance it looks it will complicate all of this, or it's a wrong impression? Like Symbol trait and SymbolUser requires SymbolTable, and Label and LabelUse will require something similar. Though m aybe I still don't understand your idea.

Why Label and LabelUse require something similar? We define the interface/trait and its semantics. But maybe we could explore something along these lines in the future, where we might need other operations to use labels.

So what do you think? I would do it in a more simple way, but if it's a wrong approach - let's try to go with the one you've suggested.

Simple is fine, goto/label have each one StringRef, no FuncOp level lists. Perhaps the verification should be done in FuncOp, so you can collect labels for all cir.gotos and cir.labels and make sure they match in one place. Let's keep this simple until its proven that this is actually affecting compile times.

bcardosolopes commented 3 months ago

Looks like some stuff is still failing, I tried to solve the merge conflict myself, perhaps it was that. Let me know once you update it and I'll land it

gitoleg commented 3 months ago

@bcardosolopes Sorry for the delay! Was overloaded for some time( So, I added verification to the FuncOp, added tests and solved the merge conflicts.

Btw, I remember that some passes have a configuration struct you can pass in that diminishes the aggressiviness, have you look into that for the canonicalizer? Perhaps there's a mode that isn't that aggressive with unrecheable blocks.

Speaking about unreachable blocks removal, I didn't understand how to feed command line parameters to the canonicalizer pass, but I found a weird solution!) Look:

cir-opt --pass-pipeline='builtin.module(cir-to-llvm,canonicalize{region-simplify=false})'

This runs the canonicalizer with disabled simplification of regions, and runs cir-to-llvm as well.

bcardosolopes commented 3 months ago

Sorry for the delay! Was overloaded for some time( So, I added verification to the FuncOp, added tests and solved the merge conflicts.

Thank you!

Speaking about unreachable blocks removal, I didn't understand how to feed command line parameters to the canonicalizer pass, but I found a weird solution!) Look:
cir-opt --pass-pipeline='builtin.module(cir-to-llvm,canonicalize{region-simplify=false})' 
This runs the canonicalizer with disabled simplification of regions, and runs cir-to-llvm as well.

I saw that and thought it was pretty clever haha, I didn't know how to achieve that. I believe it's the best we can do right now. Can you update one of the issues and mention how you applied the workaround? Might be useful information to share. I also noticed you broke it down into clang/test/CIR/Lowering/region-simplify.cir, a comment there would be great to explain the purpose of that test, but it can come next.

Thanks for working on this, it's a very important piece of missing codegen, I'm very happy to see this landing.

gitoleg commented 3 months ago

@bcardosolopes As you asked, I created a new issue that basically is an experience sharing :) #600

Speaking about the clang/test/CIR/Lowering/region-simplify.cir - the point is that the test itself checks something that is differ from its old name (it was goto.cir) - it checks that the unreachable blocks will be removed by the canonicalizer pass, and there is no goto stuff in there (though I assume the code was obtained from C with goto and labels). Also, I wasn't able to leave this test in the same file (with goto operations testing) and just add another prefix for my own tests - the pure -canonicalize with no extra parameters will cause fail here for the rest of the tests. I mean they just can't live together in the same file. So I decided to move the previous test into another file. Let me know if I did something wrong and/or you thing the new file should be named differently.

bcardosolopes commented 3 months ago

they just can't live together in the same file. So I decided to move the previous test into another file

Yep, I got that and it's all good.

Let me know if I did something wrong and/or you thing the new file should be named differently.

I was just curious about the specifics of what you were expecting to be tested by the canonicalizer, hence my comment about adding a comment in the test file. Not really necessary, but could be good in case we need to rewrite it or delete it at some point

llvm / clangir

[CIR][CodeGen] Goto pass #562