kaist-cp / cs420

KAIST CS420: Compiler Design (2023 Spring)
423 stars 28 forks source link

[HW4] Question about `joins` #183

Closed pr0cf5 closed 4 years ago

pr0cf5 commented 4 years ago

I have a few questions about the joins and jointable data structures.

Here's how I understood it:

joins is a hashmap that stores the DF(X), DF^2(X), ... DF^n(X) for all promotable stores. jointable is a data structure that encapsulates joins and domtree for efficient use. It also has a cache mechanism for fast search. jointable.lookup: a method that does the following. (for simplicity, I stripped away the caching logic)

restart:
if joins.contains(aid, bid):
    return bid
else:
    bid = idom(bid)
    goto restart

Question1

Let's imagine the following situation. graph

In this case, joins will only contain (C, l0) because DF(A)={C}, DF(B)={C}, DF(C)={}, DF(C)={}, DF(D)={}. Therefore the jointable.lookup for the load instruction in block E will return C. But the instruction in E must use the end value in block D. What's wrong with my thoughts here?

Question2

Why are the data structure named in such ways? What is join'ed exactly?

The concept of domination/phinodes/register promotion is quite new and difficult to me, so I wanted to understand the concept deeply before writing any code. Thank you!

jeehoonkang commented 4 years ago
pr0cf5 commented 4 years ago

Thank you for the answer.

I have just one more question.

It is not, so we just deduce that the last stored value at E's immediate dominator, namely D, should be the load result.

But in the video, it seems that instead of fetching the end value from the immediate dominator, it fetches the end value from the closes dominator which is a join point.

In the very first run, self.inner must be empty in the following code. So the return value must be a block that is in self.joins, because that is the only path that breaks out of the loop. Therefore, in the case above, I think lookup will return C instead of D. (It goes 'up' once and then returns)

lookup

So I think this code can be buggy in some situations. Could you confirm it?

jeehoonkang commented 4 years ago

@pr0cf5 Thank you for bringing up the issue. I confirm that it's a bug in our solution.

Thinking again, I believe the join table is not worth it for the purpose of this course. It's introduced to optimize the performance by caching the lookup results. But let's say we don't care the performance of the compiler itself doesn't matter.

The correct algorithm should be:

I hope this answers your question. If you have further questions, feel free to reply to this thread.

pr0cf5 commented 4 years ago

Thank you so much for answering our questions during the weekend. Although there is not much left of it, I hope you have a great weekend.

The 4-lined summary is very helpful for constructing the optimization pass. I think this should be seen to other students as well, would it be okay to keep it open for a while?

ahcheongL commented 4 years ago

Would the grading scheme also be changed according to the changed algorithm? I think phinodes will be inserted in different locations with the new algorithm...

jeehoonkang commented 4 years ago

@LockOne The "old" algorithm is actually buggy, so you need to implement the "new" algorithm discussed in this issue thread.

ahcheongL commented 4 years ago

I interpreted the old algorithm as it wanted to insert phinodes as little as possible (for optimization), but it seems the new algorithm does not care about the optimization of inserting phinodes.

And it seems the existing example follows the old algorithm's inserting scheme.

Here's an example from examples/mem2reg/mem2reg.input.ir

input: (Focusing on %l1:i16:y)

block b0:
  ...
  %b0:i3:i16 = load %l1:i16*
  %b0:i4:unit = call @sink:[ret:unit params:(i32)]*(%b0:i3:i16)
  %b0:i5:unit = store 1:i16 %l1:i16*
  ...
  switch undef:i32 default b1() [
    2:i32 b2()
    3:i32 b3()
  ]

block b1:
  ... (do nothing on %l1:i16)
  j b2()

block b2:
  ...
  %b2:i3:i16 = load %l1:i16*
  %b2:i4:unit = call @sink:[ret:unit params:(i32)]*(%b2:i3:i16)
  %b2:i5:unit = store 201:i16 %l1:i16*
  j b3()

expected output: (from examples/mem2reg/mem2reg.output.ir)

block b0:
  %b0:p0:i32:x
  %b0:p1:i16:y
  ...
  %b0:i4:unit = call @sink:[ret:unit params:(i32)]*(%b0:p1:i16)
  ...
  switch undef:i32 default b1() [
    2:i32 b2(0:i32)
    3:i32 b3()
  ]

block b1:
  ...
  j b2(100:i32)

block b2:
  %b2:p0:i32:x  //This x is not relevant to %l1:y
  ...
  %b2:i4:unit = call @sink:[ret:unit params:(i32)]*(1:i16)
  %b2:i5:unit = nop
  j b3()

The operand of %b2:i4 (which was the value of local variable %l1) is changed to simple constant 1 because the idom of b2 (= b0) stores constant 1 to the local variable %l1.

But, with the new algorithm, I think the output should be the following code :

block b0:
  %b0:p0:i32:x
  %b0:p1:i16:y
  ...
  %b0:i4:unit = call @sink:[ret:unit params:(i32)]*(%b0:p1:i16)
  ...
  switch undef:i32 default b1() [
    2:i32 b2(0:i32, 1:i16)
    3:i32 b3()
  ]

block b1:
 ...
  j b2(100:i32, 1:i16)

block b2:
  %b2:p0:i32:x
  %b2:p1:i16:y
  ...
  %b2:i4:unit = call @sink:[ret:unit params:(i32)]*(%b2:p1:i16)
  %b2:i5:unit = nop
  j b3()

Because the new algorithm inserts phinodes as long as the block is a join point. (And b2 is a join point)

Could you confirm this?

Thank you, (+ Is the definition of "joint point" the block with multiple predecessors?) (+ Do we still need the concept of dominance frontier with the new algorithm?)

jeehoonkang commented 4 years ago

The new algorithm still replaces the operand of %b2:i4 with 1 because:

ahcheongL commented 4 years ago

Guess I misunderstood the concept of join. I'll implement it again with the join concept. Thank you

jeehoonkang commented 4 years ago

I believe this issue is now sufficiently advertised :) Thank you all for making such a wonderful discussion thread.

junsooo commented 2 years ago

@minseongg I have question for this issue.

Now i'm understanding the new algorithm:

The correct algorithm should be:

  • consider the current block, say bid.
  • if it has an end value, it's the load result.
  • else if it's a join point, insert a phinode and the phinode is the load result.
  • otherwise, consider bid's immediate dominator and loop from the beginning.

Based on this algorithm:

Q1. So we have two parts using join_table on previous video, and that two parts were actually same. Two parts' meaning is: First is about insert or get end_value on Load instruction, and second is generating phinode recursively part. So i think we can adopt above algorithm on that two same part. Is it correct?

Q2. About this statement: otherwise, consider bid's immediate dominator and loop from the beginning. What i understand loop from the beginning means, we consider the current block, say bid. again but this time, the block is not current block but current block's idom block. Is it correct?

Q3. If Q2 is correct, then I think we have to make some recursive logic because we can't use join table. Recursive logic means, for example the first part is about

for (bid, block) in &code.blocks {
 ...
 Instruction::Load { ptr } => {
   ...
   recursively find proper block to get end_value or insert end_value or phinode_indexes
}

something like that. If we could use join table, then we just lookup from join table and it was ok. Is it correct?

minseongg commented 2 years ago

@junsooo For all questions, you're right.

junsooo commented 2 years ago

@minseongg I have another question.....

Q. For single_block case in mem2reg, I'm trying to use above algorithm.

fun i32 @single_block () {
init:
  bid: b0
  allocations:
    %l0:i32:x

block b0:
  %b0:i0:i32 = load %l0:i32*

  %b0:i1:unit = store 42:i32 %l0:i32*
  %b0:i2:i32 = load %l0:i32*

  %b0:i3:unit = store 37:i32 %l0:i32*
  %b0:i4:i32 = load %l0:i32*

  %b0:i5:unit = call @sink:[ret:unit params:(i32)]*(%b0:i0:i32)
  %b0:i6:unit = call @sink:[ret:unit params:(i32)]*(%b0:i2:i32)
  %b0:i7:unit = call @sink:[ret:unit params:(i32)]*(%b0:i4:i32)

  ret 0:i32
}

So, we have some load instructions and store instructions. First is load instruction. I used above algorithm:

So on first load instruction(%b0:i0:i32 = load %l0:i32*), i failed to run algorithm. What am i missing?

minseongg commented 2 years ago

If you didn't find the end value until the end of the loop, it is loading an undefined variable. Therefore, in that case, you should replace it with undef:i32. Also, it is correct that %b0:i2:i32 has end value 42:i32.