[HW4] Question about `joins`

pr0cf5 commented 4 years ago

I have a few questions about the joins and jointable data structures.

Here's how I understood it:

joins is a hashmap that stores the DF(X), DF^2(X), ... DF^n(X) for all promotable stores. jointable is a data structure that encapsulates joins and domtree for efficient use. It also has a cache mechanism for fast search. jointable.lookup: a method that does the following. (for simplicity, I stripped away the caching logic)

restart:
if joins.contains(aid, bid):
    return bid
else:
    bid = idom(bid)
    goto restart

Question1

Let's imagine the following situation. graph

In this case, joins will only contain (C, l0) because DF(A)={C}, DF(B)={C}, DF(C)={}, DF(C)={}, DF(D)={}. Therefore the jointable.lookup for the load instruction in block E will return C. But the instruction in E must use the end value in block D. What's wrong with my thoughts here?

Question2

Why are the data structure named in such ways? What is join'ed exactly?

The concept of domination/phinodes/register promotion is quite new and difficult to me, so I wanted to understand the concept deeply before writing any code. Thank you!

jeehoonkang commented 4 years ago

The name "join" comes from the fact that, e.g., C is the join point of A and B. In more general, DF(X) U DF(DF(X)) U ... are the (possibly recursive) join points of X.
The algorithm for deducing the value for a load instruction in E should be as follows.
- First, see if the load can be forwarded from an earlier store in E; if so, the latest stored value is the load result.
- Otherwise, see if E is a join point. It is not, so we just deduce that the last stored value at E's immediate dominator, namely D, should be the load result.
- If E were a join point, then a phinode should be inserted and the phinode values are the last values of the predecessors.
The explanation above may differ from our solution in the video. But I hope the above explanation helps you start working on homework 4.

pr0cf5 commented 4 years ago

Thank you for the answer.

I have just one more question.

It is not, so we just deduce that the last stored value at E's immediate dominator, namely D, should be the load result.

But in the video, it seems that instead of fetching the end value from the immediate dominator, it fetches the end value from the closes dominator which is a join point.

In the very first run, self.inner must be empty in the following code. So the return value must be a block that is in self.joins, because that is the only path that breaks out of the loop. Therefore, in the case above, I think lookup will return C instead of D. (It goes 'up' once and then returns)

lookup

So I think this code can be buggy in some situations. Could you confirm it?

jeehoonkang commented 4 years ago

@pr0cf5 Thank you for bringing up the issue. I confirm that it's a bug in our solution.

Thinking again, I believe the join table is not worth it for the purpose of this course. It's introduced to optimize the performance by caching the lookup results. But let's say we don't care the performance of the compiler itself doesn't matter.

The correct algorithm should be:

consider the current block, say bid.
if it has an end value, it's the load result.
else if it's a join point, insert a phinode and the phinode is the load result.
otherwise, consider bid's immediate dominator and loop from the beginning.

I hope this answers your question. If you have further questions, feel free to reply to this thread.

pr0cf5 commented 4 years ago

Thank you so much for answering our questions during the weekend. Although there is not much left of it, I hope you have a great weekend.

The 4-lined summary is very helpful for constructing the optimization pass. I think this should be seen to other students as well, would it be okay to keep it open for a while?

ahcheongL commented 4 years ago

Would the grading scheme also be changed according to the changed algorithm? I think phinodes will be inserted in different locations with the new algorithm...

jeehoonkang commented 4 years ago

@LockOne The "old" algorithm is actually buggy, so you need to implement the "new" algorithm discussed in this issue thread.

ahcheongL commented 4 years ago

I interpreted the old algorithm as it wanted to insert phinodes as little as possible (for optimization), but it seems the new algorithm does not care about the optimization of inserting phinodes.

And it seems the existing example follows the old algorithm's inserting scheme.

Here's an example from examples/mem2reg/mem2reg.input.ir

input: (Focusing on %l1:i16:y)

block b0:
  ...
  %b0:i3:i16 = load %l1:i16*
  %b0:i4:unit = call @sink:[ret:unit params:(i32)]*(%b0:i3:i16)
  %b0:i5:unit = store 1:i16 %l1:i16*
  ...
  switch undef:i32 default b1() [
    2:i32 b2()
    3:i32 b3()
  ]

block b1:
  ... (do nothing on %l1:i16)
  j b2()

block b2:
  ...
  %b2:i3:i16 = load %l1:i16*
  %b2:i4:unit = call @sink:[ret:unit params:(i32)]*(%b2:i3:i16)
  %b2:i5:unit = store 201:i16 %l1:i16*
  j b3()

expected output: (from examples/mem2reg/mem2reg.output.ir)

block b0:
  %b0:p0:i32:x
  %b0:p1:i16:y
  ...
  %b0:i4:unit = call @sink:[ret:unit params:(i32)]*(%b0:p1:i16)
  ...
  switch undef:i32 default b1() [
    2:i32 b2(0:i32)
    3:i32 b3()
  ]

block b1:
  ...
  j b2(100:i32)

block b2:
  %b2:p0:i32:x  //This x is not relevant to %l1:y
  ...
  %b2:i4:unit = call @sink:[ret:unit params:(i32)]*(1:i16)
  %b2:i5:unit = nop
  j b3()

The operand of %b2:i4 (which was the value of local variable %l1) is changed to simple constant 1 because the idom of b2 (= b0) stores constant 1 to the local variable %l1.

But, with the new algorithm, I think the output should be the following code :

block b0:
  %b0:p0:i32:x
  %b0:p1:i16:y
  ...
  %b0:i4:unit = call @sink:[ret:unit params:(i32)]*(%b0:p1:i16)
  ...
  switch undef:i32 default b1() [
    2:i32 b2(0:i32, 1:i16)
    3:i32 b3()
  ]

block b1:
 ...
  j b2(100:i32, 1:i16)

block b2:
  %b2:p0:i32:x
  %b2:p1:i16:y
  ...
  %b2:i4:unit = call @sink:[ret:unit params:(i32)]*(%b2:p1:i16)
  %b2:i5:unit = nop
  j b3()

Because the new algorithm inserts phinodes as long as the block is a join point. (And b2 is a join point)

Could you confirm this?

Thank you, (+ Is the definition of "joint point" the block with multiple predecessors?) (+ Do we still need the concept of dominance frontier with the new algorithm?)

jeehoonkang commented 4 years ago

The new algorithm still replaces the operand of %b2:i4 with 1 because:

Consider b2. It doesn't have an end value and it's not a join point. So consider the immediate dominator b1.
Consider b1. It doesn't have an end value and it's not a join point. So consider the immediate dominator b0.
Consider b0. It has the end value 1.

ahcheongL commented 4 years ago

Guess I misunderstood the concept of join. I'll implement it again with the join concept. Thank you

jeehoonkang commented 4 years ago

I believe this issue is now sufficiently advertised :) Thank you all for making such a wonderful discussion thread.

junsooo commented 2 years ago

@minseongg I have question for this issue.

Now i'm understanding the new algorithm:

The correct algorithm should be:

consider the current block, say bid.

if it has an end value, it's the load result.

else if it's a join point, insert a phinode and the phinode is the load result.

otherwise, consider bid's immediate dominator and loop from the beginning.

Based on this algorithm:

Q1. So we have two parts using join_table on previous video, and that two parts were actually same. Two parts' meaning is: First is about insert or get end_value on Load instruction, and second is generating phinode recursively part. So i think we can adopt above algorithm on that two same part. Is it correct?

Q2. About this statement: otherwise, consider bid's immediate dominator and loop from the beginning. What i understand loop from the beginning means, we consider the current block, say bid. again but this time, the block is not current block but current block's idom block. Is it correct?

Q3. If Q2 is correct, then I think we have to make some recursive logic because we can't use join table. Recursive logic means, for example the first part is about

for (bid, block) in &code.blocks {
 ...
 Instruction::Load { ptr } => {
   ...
   recursively find proper block to get end_value or insert end_value or phinode_indexes
}

something like that. If we could use join table, then we just lookup from join table and it was ok. Is it correct?

minseongg commented 2 years ago

@junsooo For all questions, you're right.

junsooo commented 2 years ago

@minseongg I have another question.....

Q. For single_block case in mem2reg, I'm trying to use above algorithm.

fun i32 @single_block () {
init:
  bid: b0
  allocations:
    %l0:i32:x

block b0:
  %b0:i0:i32 = load %l0:i32*

  %b0:i1:unit = store 42:i32 %l0:i32*
  %b0:i2:i32 = load %l0:i32*

  %b0:i3:unit = store 37:i32 %l0:i32*
  %b0:i4:i32 = load %l0:i32*

  %b0:i5:unit = call @sink:[ret:unit params:(i32)]*(%b0:i0:i32)
  %b0:i6:unit = call @sink:[ret:unit params:(i32)]*(%b0:i2:i32)
  %b0:i7:unit = call @sink:[ret:unit params:(i32)]*(%b0:i4:i32)

  ret 0:i32
}

So, we have some load instructions and store instructions. First is load instruction. I used above algorithm:

consider the current block, say bid. -> (b0)
if it has an end value, it's the load result. -> (%b0:i0:i32 doesn't have end value, but %b0:i2:i32 has end value which is 42:i32. By the way, is this correct?)
else if it's a join point, insert a phinode and the phinode is the load result. (b0 is not join point, and self.joins' value at this point is: {0: {}}.)
otherwise, consider bid's immediate dominator and loop from the beginning. (There is no b0's immediate dominator. domtree.idom(bid) is None. So i cannot loop from the beginning.)

So on first load instruction(%b0:i0:i32 = load %l0:i32*), i failed to run algorithm. What am i missing?

minseongg commented 2 years ago

If you didn't find the end value until the end of the loop, it is loading an undefined variable. Therefore, in that case, you should replace it with undef:i32. Also, it is correct that %b0:i2:i32 has end value 42:i32.

kaist-cp / cs420