Closed dannypsnl closed 2 years ago
I know phi instructions are a bit difficult to deal with, especially as they often have to reference LLVM IR values which have not yet been defined.
I think these kind of checks belong to two places, namely as a validation step in an irsem
package (validation of LLVM IR semantics), and as part of an irutil
package.
The intention is still to keep the core llir/llvm
package to a minimum; where instructions are instructions and terminators are terminators. Then the irsem
and irutil
packages may help validate the constructed LLVM IR and introduce API to make it easier to create valid LLVM IR.
Note, in prior versions of llir/llvm
we had these validity checks earlier (e.g. type checking during construction of instructions), but this API failed to allow recursive definition cases and other special cases where type checking had to be deferred until the entire IR was constructed. For those reasons, I think it's best to keep special handling of phi instructions targetted for irsem
and irutil
packages.
What do you think?
Cheers, Robin
I understand your consideration, but here is not actually about validating, the real problem is operating on instruction is hard for the user. Most user won't read source code and follow several internal conventions when modifying existed instruction, if we let them look into implementation is even more harmful since implementation might change, I would tend to avoid such situation. I also check again, I believe only phi
need to locate at the beginning of the block, and can't appear in the first block in a function.
The first basic block in a function is special in two ways: it is immediately executed on entrance to the function, and it is not allowed to have predecessor basic blocks (i.e. there can not be any branches to the entry block of a function). Because the block can have no predecessors, it also cannot have any PHI nodes.
Hence, I still would like to implement this one to avoid leak of abstraction and have more free for future changes.
I understand your consideration, but here is not actually about validating, the real problem is operating on instruction is hard for the user.
I agree with you that for most users and use cases, creating phi instructions is difficult, and for this reason, the general recommendation for front-end compiler developers is to instead rely on the -mem2reg optimization pass of LLVM and model local variables using alloca
and then use load
and store
to read and assign values to these local variables.
This was mentioned in the "LLVM IR and Go" post on Gopher Academy.
To simplify the implementation of LLVM front-ends, one recommendation is to model local variables in the source language as memory allocated variables (using alloca), model assignments to local variables as store to memory, and uses of local variables as load from memory. The reason for this is that it may be non-trivial to directly translate a source language into LLVM IR in SSA-form. As long as the memory accesses follows certain patters, we may then rely on the mem2reg LLVM optimization pass to translate memory allocate local variables to registers in SSA-form (using phi nodes where necessary).
I still understand your preference to change the API. Sorry to be so resistant in this regard. The main reason is still that I believe a really good LLVM IR builder API is yet to be defined. And indeed, llir/llvm/ir
should be considered a low-level package. The intention in the future is that users should have a higher level package for LLVM IR construction with which they interact. For this reason, llir/llvm/ir
exposes all internal fields of its structure types, and this is by design, so that there may exist a high level LLVM IR construction library; e.g. irgen
which has the capability of constructing arbitrary LLVM IR modules.
Similarly to the irutil
package, the "best" API for the irgen
package is not yet known and we would have to do experimentation. Once we arrive at a solid API for IR generation, such a package can be promoted to llir/irgen
.
For background reference, these topics were discussed and brought up in https://github.com/llir/llvm/issues/3#issuecomment-70849690
My current feeling is that there will exist a low-level IR library with concrete types, maybe hidden as an internal package. One or more high-level packages will build abstractions on this low-level package, mainly because the abstractions that feels good to one user may feel bad to another.
Indeed, the current ir package is too low-level for any user (except package developers). Basically the low-level package will give you enough power to shoot yourself in the foot, while the high-level package will prevent the majority of the common mistakes by design (e.g. move/remove a BasicBlock, but forget to update references to it, etc as you already mentioned).
Cheers, Robin
Ok, wrapping on llir was also a good choice, I would make some experiments irgen once I have time.
Ok, wrapping on llir was also a good choice, I would make some experiments irgen once I have time.
That's great! :)
In https://pkg.go.dev/github.com/dannypsnl/extend@v0.2.0#ExtBlock.NewPhi you can get exactly what I described above.
initExpr := ...
step := ...
extBlock.NewPhi(
ir.NewIncoming(initExpr, nextBlock),
ir.NewIncoming(step, loopBlock),
)
In above code, phi
instruction be moved to the top of the block automatically!
So I'm going to close this due to the solution existing. Feel free to reopen if you want anything more~~~
@mewmew sorry that I reopen the issue, so after some thought, I still want to port this behavior back to ir.Block
. Fortunately, Starter
is not required! Because only phi
instructions are in front of the block. Thus, I want to define:
type Block struct {
Phis []*InstPhi
Insts []Instruction
}
Now,
Phis
then print Insts
NewPhi(incs ...*ir.Incoming) *ir.InstPhi
puts incs
into Phis
The benefit is that
Phis
by call order of NewPhi
Insts
, just also can operate Phis
@dannypsnl, would block.Insts
still include PHI instructions (but require that they are first in the list)? Or would PHI instructions only be included in the block.Phis
list?
PHI instructions will only be in block.Phis
PHI instructions will only be in block.Phis
Thanks for clarifying. Given that we would no longer place PHI instructions in block.Insts
, then the change is an API breaking change.
Any code like this would stop working:
for _, inst := range block.Insts {
switch inst := inst.(type) {
case *ir.InstPhi:
// operate on PHI instruction.
}
}
Therefore, we have to think hard before introducing such a change, due to API breakage.
If the benefit is determined to outweigh the drawbacks of API breakage, then the change may still be introduced (we are still in v0.x
).
I will try to give it some thought, but I am leaning towards having the llir/llvm/ir
package be low-level, and have other IR builder packages and IR semantic validation packages ensure correct semantics and facilitate correct usage (as further elaborated in https://github.com/llir/llvm/issues/153#issuecomment-720180241).
Cheers, Robin
If we only do this on v0.4?
If we only do this on v0.4?
I am having doubts since all existing code would stop working (if handling PHI nodes using block.Insts
), but still compile successfully. Thus, if someone upgrades from v0.3
to v0.4
and uses code such as that presented in https://github.com/llir/llvm/issues/153#issuecomment-992749985, their code would compile successfully, but silently produce incorrect results during runtime (since the block.Insts
would no longer contain any PHI instructions).
This is very subtle and would be confusion for users of llir/llvm/ir
.
I agree that having PHI nodes distinct from block.Insts
does make sense, since they have a slightly different semantic than regular instructions, being that all PHI nodes must come before any other instruction any given basic block.
However, making such a change now is risky, as it could cause a lot of code to break by our users, even if they don't realize it (since the code would still compile).
So, for the time being, I'm in favour of letting other irutil
libraries and IR generation libraries handle PHI instructions in another way, but let them be regular instructions in llir/llvm/ir
.
Note, this is a very good case to check for in the new irsem
package. Run a pass through all basic blocks of all functions of a given module and report if there are any PHI instructions after any non-PHI instructions in a given basic block. If so, report an error.
Cheers, Robin
However, making such a change now is risky, as it could cause a lot of code to break by our users, even if they don't realize it (since the code would still compile).
I think this is fine, their program would be fine with dependencies lock(this is why we versioning).
Rather than check ideas for a long time and users still suffering from the current codebase, my preferred release solution is
Thus, we provide a promise: you only have one thing to change in the compiler for every update. For users that didn't use the package for a long time, there has no reason to believe they will go back without relearning. For users that actively using the package, this flow should be enough.
And notice this proposal is not just since we could go wrong, but also for easier to write IR correct! I think form code to be correct at the beginning is better than check them later.
I agree with your reasoning.
The thing that scares me is that some users may simply do go get -u ./...
to update dependencies, and then do go build
in their project to see if it still compiles. And if it compiles, assume that it should probably work. Unless they take the time to read our specific release notes (which they may not think to do, since their project still compiles successfully after update), they would have introduced a subtle bug into their program by updating.
Otherwise, I agree that it is good practice to make the API easier to use correctly (and more difficult to use incorrectly), so if we would have thought about the Phis []*ir.InstPhi
field when first designing the ir.Block
struct, then the choice would have been easier.
So, I still have to give this some thought.
It may be possible that we do update, and then write a tool to send PRs to all known open source projects that need to be updated, e.g. by writing our own go fix
rule. If we did this, then the API breakage concern would be lower.
Cheers, Robin
Just a hint: https://github.com/search?q=%27case+*ir.InstPhi%27+language%3AGo&type=code we actually can detect some effect range?
According to language reference: phi:
and the following code can be compiled by
llc
:That means
phi
is kind of starter instruction, and we can have several starter instructions which always be put at the beginning of the basic block, then we can avoid this kind of code:We can do like this:
but won't break order we expected, and we would have new BasicBlock definition.