Closed pr0cf5 closed 4 years ago
The name "join" comes from the fact that, e.g., C is the join point of A and B. In more general, DF(X) U DF(DF(X)) U ... are the (possibly recursive) join points of X.
The algorithm for deducing the value for a load instruction in E should be as follows.
The explanation above may differ from our solution in the video. But I hope the above explanation helps you start working on homework 4.
Thank you for the answer.
I have just one more question.
It is not, so we just deduce that the last stored value at E's immediate dominator, namely D, should be the load result.
But in the video, it seems that instead of fetching the end value from the immediate dominator, it fetches the end value from the closes dominator which is a join point.
In the very first run, self.inner
must be empty in the following code. So the return value must be a block that is in self.joins
, because that is the only path that breaks out of the loop. Therefore, in the case above, I think lookup
will return C instead of D. (It goes 'up' once and then returns)
So I think this code can be buggy in some situations. Could you confirm it?
@pr0cf5 Thank you for bringing up the issue. I confirm that it's a bug in our solution.
Thinking again, I believe the join table is not worth it for the purpose of this course. It's introduced to optimize the performance by caching the lookup results. But let's say we don't care the performance of the compiler itself doesn't matter.
The correct algorithm should be:
bid
.bid
's immediate dominator and loop from the beginning.I hope this answers your question. If you have further questions, feel free to reply to this thread.
Thank you so much for answering our questions during the weekend. Although there is not much left of it, I hope you have a great weekend.
The 4-lined summary is very helpful for constructing the optimization pass. I think this should be seen to other students as well, would it be okay to keep it open for a while?
Would the grading scheme also be changed according to the changed algorithm? I think phinodes will be inserted in different locations with the new algorithm...
@LockOne The "old" algorithm is actually buggy, so you need to implement the "new" algorithm discussed in this issue thread.
I interpreted the old algorithm as it wanted to insert phinodes as little as possible (for optimization), but it seems the new algorithm does not care about the optimization of inserting phinodes.
And it seems the existing example follows the old algorithm's inserting scheme.
Here's an example from examples/mem2reg/mem2reg.input.ir
input: (Focusing on %l1:i16:y)
block b0:
...
%b0:i3:i16 = load %l1:i16*
%b0:i4:unit = call @sink:[ret:unit params:(i32)]*(%b0:i3:i16)
%b0:i5:unit = store 1:i16 %l1:i16*
...
switch undef:i32 default b1() [
2:i32 b2()
3:i32 b3()
]
block b1:
... (do nothing on %l1:i16)
j b2()
block b2:
...
%b2:i3:i16 = load %l1:i16*
%b2:i4:unit = call @sink:[ret:unit params:(i32)]*(%b2:i3:i16)
%b2:i5:unit = store 201:i16 %l1:i16*
j b3()
expected output: (from examples/mem2reg/mem2reg.output.ir)
block b0:
%b0:p0:i32:x
%b0:p1:i16:y
...
%b0:i4:unit = call @sink:[ret:unit params:(i32)]*(%b0:p1:i16)
...
switch undef:i32 default b1() [
2:i32 b2(0:i32)
3:i32 b3()
]
block b1:
...
j b2(100:i32)
block b2:
%b2:p0:i32:x //This x is not relevant to %l1:y
...
%b2:i4:unit = call @sink:[ret:unit params:(i32)]*(1:i16)
%b2:i5:unit = nop
j b3()
The operand of %b2:i4 (which was the value of local variable %l1) is changed to simple constant 1 because the idom of b2 (= b0) stores constant 1 to the local variable %l1.
But, with the new algorithm, I think the output should be the following code :
block b0:
%b0:p0:i32:x
%b0:p1:i16:y
...
%b0:i4:unit = call @sink:[ret:unit params:(i32)]*(%b0:p1:i16)
...
switch undef:i32 default b1() [
2:i32 b2(0:i32, 1:i16)
3:i32 b3()
]
block b1:
...
j b2(100:i32, 1:i16)
block b2:
%b2:p0:i32:x
%b2:p1:i16:y
...
%b2:i4:unit = call @sink:[ret:unit params:(i32)]*(%b2:p1:i16)
%b2:i5:unit = nop
j b3()
Because the new algorithm inserts phinodes as long as the block is a join point. (And b2 is a join point)
Could you confirm this?
Thank you, (+ Is the definition of "joint point" the block with multiple predecessors?) (+ Do we still need the concept of dominance frontier with the new algorithm?)
The new algorithm still replaces the operand of %b2:i4
with 1 because:
b2
. It doesn't have an end value and it's not a join point. So consider the immediate dominator b1
.b1
. It doesn't have an end value and it's not a join point. So consider the immediate dominator b0
.b0
. It has the end value 1.Guess I misunderstood the concept of join. I'll implement it again with the join concept. Thank you
I believe this issue is now sufficiently advertised :) Thank you all for making such a wonderful discussion thread.
@minseongg I have question for this issue.
Now i'm understanding the new algorithm:
The correct algorithm should be:
- consider the current block, say bid.
- if it has an end value, it's the load result.
- else if it's a join point, insert a phinode and the phinode is the load result.
- otherwise, consider bid's immediate dominator and loop from the beginning.
Based on this algorithm:
Q1. So we have two parts using join_table on previous video, and that two parts were actually same. Two parts' meaning is: First is about insert or get end_value on Load instruction, and second is generating phinode recursively part. So i think we can adopt above algorithm on that two same part. Is it correct?
Q2.
About this statement:
otherwise, consider bid's immediate dominator and loop from the beginning.
What i understand loop from the beginning means, we consider the current block, say bid.
again but this time, the block is not current block but current block's idom block. Is it correct?
Q3. If Q2 is correct, then I think we have to make some recursive logic because we can't use join table. Recursive logic means, for example the first part is about
for (bid, block) in &code.blocks {
...
Instruction::Load { ptr } => {
...
recursively find proper block to get end_value or insert end_value or phinode_indexes
}
something like that. If we could use join table, then we just lookup from join table and it was ok. Is it correct?
@junsooo For all questions, you're right.
@minseongg I have another question.....
Q. For single_block case in mem2reg, I'm trying to use above algorithm.
fun i32 @single_block () {
init:
bid: b0
allocations:
%l0:i32:x
block b0:
%b0:i0:i32 = load %l0:i32*
%b0:i1:unit = store 42:i32 %l0:i32*
%b0:i2:i32 = load %l0:i32*
%b0:i3:unit = store 37:i32 %l0:i32*
%b0:i4:i32 = load %l0:i32*
%b0:i5:unit = call @sink:[ret:unit params:(i32)]*(%b0:i0:i32)
%b0:i6:unit = call @sink:[ret:unit params:(i32)]*(%b0:i2:i32)
%b0:i7:unit = call @sink:[ret:unit params:(i32)]*(%b0:i4:i32)
ret 0:i32
}
So, we have some load instructions and store instructions. First is load instruction. I used above algorithm:
So on first load instruction(%b0:i0:i32 = load %l0:i32*), i failed to run algorithm. What am i missing?
If you didn't find the end value until the end of the loop, it is loading an undefined variable. Therefore, in that case, you should replace it with undef:i32
. Also, it is correct that %b0:i2:i32
has end value 42:i32
.
I have a few questions about the
joins
andjointable
data structures.Here's how I understood it:
joins
is a hashmap that stores the DF(X), DF^2(X), ... DF^n(X) for all promotable stores.jointable
is a data structure that encapsulatesjoins
anddomtree
for efficient use. It also has a cache mechanism for fast search.jointable.lookup
: a method that does the following. (for simplicity, I stripped away the caching logic)Question1
Let's imagine the following situation.
In this case,
joins
will only contain (C, l0) because DF(A)={C}, DF(B)={C}, DF(C)={}, DF(C)={}, DF(D)={}. Therefore thejointable.lookup
for the load instruction in block E will return C. But the instruction in E must use the end value in block D. What's wrong with my thoughts here?Question2
Why are the data structure named in such ways? What is
join
'ed exactly?The concept of domination/phinodes/register promotion is quite new and difficult to me, so I wanted to understand the concept deeply before writing any code. Thank you!