irgen: starter instructions

dannypsnl commented 4 years ago

There must be no non-phi instructions between the start of a basic block and the PHI instructions: i.e. PHI instructions must be first in a basic block.

and the following code can be compiled by llc:

Loop:       ; Infinite loop that counts from 0 on up...
  %indvar = phi i32 [ 0, %LoopHeader ], [ %nextindvar, %Loop ]
  %indvar2 = phi i32 [ 0, %LoopHeader ], [ %nextindvar, %Loop ]
  %nextindvar = add i32 %indvar, 1
  br label %Loop

That means phi is kind of starter instruction, and we can have several starter instructions which always be put at the beginning of the basic block, then we can avoid this kind of code:

firstAppear := loopCtx.NewPhi(ir.NewIncoming(loopCtx.compileExpr(s.InitExpr), ctx.Block))
step := loopCtx.compileExpr(s.Step)
firstAppear.Incs = append(firstAppear.Incs, ir.NewIncoming(step, loopCtx.Block))

We can do like this:

step := loopCtx.compileExpr(s.Step)
firstAppear := loopCtx.NewPhi(
    ir.NewIncoming(loopCtx.compileExpr(s.InitExpr), ctx.Block),
    ir.NewIncoming(step, loopCtx.Block))

but won't break order we expected, and we would have new BasicBlock definition.

type Block struct {
    Starters []Starter
}

mewmew commented 4 years ago

I know phi instructions are a bit difficult to deal with, especially as they often have to reference LLVM IR values which have not yet been defined.

I think these kind of checks belong to two places, namely as a validation step in an irsem package (validation of LLVM IR semantics), and as part of an irutil package.

The intention is still to keep the core llir/llvm package to a minimum; where instructions are instructions and terminators are terminators. Then the irsem and irutil packages may help validate the constructed LLVM IR and introduce API to make it easier to create valid LLVM IR.

Note, in prior versions of llir/llvm we had these validity checks earlier (e.g. type checking during construction of instructions), but this API failed to allow recursive definition cases and other special cases where type checking had to be deferred until the entire IR was constructed. For those reasons, I think it's best to keep special handling of phi instructions targetted for irsem and irutil packages.

What do you think?

Cheers, Robin

dannypsnl commented 4 years ago

I understand your consideration, but here is not actually about validating, the real problem is operating on instruction is hard for the user. Most user won't read source code and follow several internal conventions when modifying existed instruction, if we let them look into implementation is even more harmful since implementation might change, I would tend to avoid such situation. I also check again, I believe only phi need to locate at the beginning of the block, and can't appear in the first block in a function.

language reference: function

The first basic block in a function is special in two ways: it is immediately executed on entrance to the function, and it is not allowed to have predecessor basic blocks (i.e. there can not be any branches to the entry block of a function). Because the block can have no predecessors, it also cannot have any PHI nodes.

Hence, I still would like to implement this one to avoid leak of abstraction and have more free for future changes.

mewmew commented 4 years ago

I understand your consideration, but here is not actually about validating, the real problem is operating on instruction is hard for the user.

I agree with you that for most users and use cases, creating phi instructions is difficult, and for this reason, the general recommendation for front-end compiler developers is to instead rely on the -mem2reg optimization pass of LLVM and model local variables using alloca and then use load and store to read and assign values to these local variables.

This was mentioned in the "LLVM IR and Go" post on Gopher Academy.

To simplify the implementation of LLVM front-ends, one recommendation is to model local variables in the source language as memory allocated variables (using alloca), model assignments to local variables as store to memory, and uses of local variables as load from memory. The reason for this is that it may be non-trivial to directly translate a source language into LLVM IR in SSA-form. As long as the memory accesses follows certain patters, we may then rely on the mem2reg LLVM optimization pass to translate memory allocate local variables to registers in SSA-form (using phi nodes where necessary).

I still understand your preference to change the API. Sorry to be so resistant in this regard. The main reason is still that I believe a really good LLVM IR builder API is yet to be defined. And indeed, llir/llvm/ir should be considered a low-level package. The intention in the future is that users should have a higher level package for LLVM IR construction with which they interact. For this reason, llir/llvm/ir exposes all internal fields of its structure types, and this is by design, so that there may exist a high level LLVM IR construction library; e.g. irgen which has the capability of constructing arbitrary LLVM IR modules.

Similarly to the irutil package, the "best" API for the irgen package is not yet known and we would have to do experimentation. Once we arrive at a solid API for IR generation, such a package can be promoted to llir/irgen.

For background reference, these topics were discussed and brought up in https://github.com/llir/llvm/issues/3#issuecomment-70849690

My current feeling is that there will exist a low-level IR library with concrete types, maybe hidden as an internal package. One or more high-level packages will build abstractions on this low-level package, mainly because the abstractions that feels good to one user may feel bad to another.

Indeed, the current ir package is too low-level for any user (except package developers). Basically the low-level package will give you enough power to shoot yourself in the foot, while the high-level package will prevent the majority of the common mistakes by design (e.g. move/remove a BasicBlock, but forget to update references to it, etc as you already mentioned).

Cheers, Robin

dannypsnl commented 4 years ago

Ok, wrapping on llir was also a good choice, I would make some experiments irgen once I have time.

mewmew commented 4 years ago

Ok, wrapping on llir was also a good choice, I would make some experiments irgen once I have time.

That's great! :)

dannypsnl commented 2 years ago

In https://pkg.go.dev/github.com/dannypsnl/extend@v0.2.0#ExtBlock.NewPhi you can get exactly what I described above.

initExpr := ...
step := ...
extBlock.NewPhi(
    ir.NewIncoming(initExpr, nextBlock),
    ir.NewIncoming(step, loopBlock),
)

In above code, phi instruction be moved to the top of the block automatically!

So I'm going to close this due to the solution existing. Feel free to reopen if you want anything more~~~

dannypsnl commented 2 years ago

@mewmew sorry that I reopen the issue, so after some thought, I still want to port this behavior back to ir.Block. Fortunately, Starter is not required! Because only phi instructions are in front of the block. Thus, I want to define:

type Block struct {
    Phis []*InstPhi
    Insts []Instruction
}

Now,

print Phis then print Insts
NewPhi(incs ...*ir.Incoming) *ir.InstPhi puts incs into Phis

The benefit is that

keep order of Phis by call order of NewPhi
Users still can operate Insts, just also can operate Phis
All APIs can be used like before

mewmew commented 2 years ago

@dannypsnl, would block.Insts still include PHI instructions (but require that they are first in the list)? Or would PHI instructions only be included in the block.Phis list?

dannypsnl commented 2 years ago

PHI instructions will only be in block.Phis

mewmew commented 2 years ago

PHI instructions will only be in block.Phis

Thanks for clarifying. Given that we would no longer place PHI instructions in block.Insts, then the change is an API breaking change.

Any code like this would stop working:

for _, inst := range block.Insts {
   switch inst := inst.(type) {
   case *ir.InstPhi:
      // operate on PHI instruction.
   }
}

Therefore, we have to think hard before introducing such a change, due to API breakage.

If the benefit is determined to outweigh the drawbacks of API breakage, then the change may still be introduced (we are still in v0.x).

I will try to give it some thought, but I am leaning towards having the llir/llvm/ir package be low-level, and have other IR builder packages and IR semantic validation packages ensure correct semantics and facilitate correct usage (as further elaborated in https://github.com/llir/llvm/issues/153#issuecomment-720180241).

Cheers, Robin

dannypsnl commented 2 years ago

If we only do this on v0.4?

mewmew commented 2 years ago

If we only do this on v0.4?

I am having doubts since all existing code would stop working (if handling PHI nodes using block.Insts), but still compile successfully. Thus, if someone upgrades from v0.3 to v0.4 and uses code such as that presented in https://github.com/llir/llvm/issues/153#issuecomment-992749985, their code would compile successfully, but silently produce incorrect results during runtime (since the block.Insts would no longer contain any PHI instructions).

This is very subtle and would be confusion for users of llir/llvm/ir.

I agree that having PHI nodes distinct from block.Insts does make sense, since they have a slightly different semantic than regular instructions, being that all PHI nodes must come before any other instruction any given basic block.

However, making such a change now is risky, as it could cause a lot of code to break by our users, even if they don't realize it (since the code would still compile).

So, for the time being, I'm in favour of letting other irutil libraries and IR generation libraries handle PHI instructions in another way, but let them be regular instructions in llir/llvm/ir.

Note, this is a very good case to check for in the new irsem package. Run a pass through all basic blocks of all functions of a given module and report if there are any PHI instructions after any non-PHI instructions in a given basic block. If so, report an error.

Cheers, Robin

dannypsnl commented 2 years ago

However, making such a change now is risky, as it could cause a lot of code to break by our users, even if they don't realize it (since the code would still compile).

I think this is fine, their program would be fine with dependencies lock(this is why we versioning).

Rather than check ideas for a long time and users still suffering from the current codebase, my preferred release solution is

release break changes
but if any break changes happen, we do releasing immediately
if not, we keep waiting

Thus, we provide a promise: you only have one thing to change in the compiler for every update. For users that didn't use the package for a long time, there has no reason to believe they will go back without relearning. For users that actively using the package, this flow should be enough.

And notice this proposal is not just since we could go wrong, but also for easier to write IR correct! I think form code to be correct at the beginning is better than check them later.

mewmew commented 2 years ago

I agree with your reasoning.

The thing that scares me is that some users may simply do go get -u ./... to update dependencies, and then do go build in their project to see if it still compiles. And if it compiles, assume that it should probably work. Unless they take the time to read our specific release notes (which they may not think to do, since their project still compiles successfully after update), they would have introduced a subtle bug into their program by updating.

Otherwise, I agree that it is good practice to make the API easier to use correctly (and more difficult to use incorrectly), so if we would have thought about the Phis []*ir.InstPhi field when first designing the ir.Block struct, then the choice would have been easier.

So, I still have to give this some thought.

It may be possible that we do update, and then write a tool to send PRs to all known open source projects that need to be updated, e.g. by writing our own go fix rule. If we did this, then the API breakage concern would be lower.

Cheers, Robin

dannypsnl commented 2 years ago

Just a hint: https://github.com/search?q=%27case+*ir.InstPhi%27+language%3AGo&type=code we actually can detect some effect range?

llir / llvm

irgen: starter instructions #153