Closed yaqiz01 closed 7 years ago
Case 3.1:
MetaPipe A { i =>
MetaPipe B { j =>
Pipe C { k => // CU1
sram1(i) = i
}
Pipe D { g => // CU2
sram2(i) = i
} // CU2
}
}
Pipe E { h => // CU3
... = sram1(h)
... = sram2(h)
}
In this case i
is used for write address for both sram1 and sram2, since for writer address counter chain copying, the copied counter chain has to represent the state of writer. According to case 1, A.cchain would be copied at both Pipe C and Pipe D. Does that mean we need to make two distinct copy of A.cchain at PipeE, one chained with C.cchain.copy and the other chained with D.cchain.copy at CU3?
If it's pure data flow, there's no guarantee that i
(A.cchain.copy) at PipeC and PipeD would be the same.
@dkoeplin, you don't need to worry about the other additional copies of counter chain that you don't generate in spatial. But if we need to make two copies of A.cchain at PipeE, it's probably better to do it more explicitly since it's quite nasty for me to make another copy of A.cchain and swap the usage of the other copy in write address stages, sram write port, etc. @raghup17
In this particular example, Metapipe B is the only stage in Metapipe A. This indicates that 'i' will be the same across Pipe C and Pipe D. So, technically, the following should suffice for this example:
Summarized (in ASCII art) below:
+---------------+ +------------------+
| Pipe C | | Pipe E |
| | | |
| | | +-------+ |
| | data | |sram1 | |
| C k i+--------> | | |
| | | | | +-------+ |
| B j | | |
| | | | | +-------+ |
| A i | | |sram2 | |
+---------------+ | | | |
+---------------+ data | +-------+ |
| Pipe D +--------> |
| || | A i |
| || +----^-------------+
| i++ |
| D g | |
| | | | control |
| B j+---------------------+
| | | |
| A i |
+---------------+
I haven't given much thought into (if and) how this can be generalized into a rule.
However, I see your point; with a slightly modified example: Case 3.2
MetaPipe A { i =>
Pipe C { k => // CU1
sram1(i) = i
}
Pipe D { g => // CU2
sram2(i) = i
} // CU2
}
Pipe E { h => // CU3
... = sram1(h) // Needs A -- C counter chain
... = sram2(h) // Needs A -- D counter chain
}
In this example, 'i' is definitely different at Pipe C and Pipe D. Currently, we would have to create two counter chains (A--C and A--D) like you pointed out.
Wow how did you draw that graph?
Even with original example, there's no guarantee that i in Pipe C and Pipe D will be the same because in Pipe C, k - j - i Pipe D, g - j - i k and g can have different range and Pipe C and Pipe D can have different number of stages/single iteration latency If Pipe C and Pipe D are tile load, there's difference in offchip load latencies as well. At the moment I'm just throwing exception on this. Have to think how to express this in PIR more explicitly
If it's last iteration of j before incrementing i in both C and D, the token guarantees k and g starts at the same time, but when they saturate at different time, j would saturate at different time, which would increment i at different time, is this possible?
Actually that's a decent point. Is the saturation of the C's copy of A's counterchain depending only on it's local copies of chains B and C? Or does it also receive a control signal from D to regulate when local A increments?
So that's the first case, when copied outer controller counter chain is used for local computation, such as read address or simply using the value, it's only locally chained, which doesn't depends on whether the original outer controller is enabled. The reason is that this local copy has to represent the reader's state. For example,
MetaPipe A { i =>
Pipe B { j =>
// use i
}
Pipe C { k =>
// use i
}
}
In this case i is used at B and C, but since it's MetaPipeline , value of i is pipelined in B and C. So we can't just route original i's enable from C to B
Wait, for Case 2.2,
MetaPipe A { i =>
MetaPipe B { j =>
Pipe C { k => // CU1
sram(i) = ...
}
Pipe D {} // CU2
}
}
Pipe E { h => // CU3
... = sram(h)
}
Is there a reason that I can't just make a copy of i at E and route i's enable to E without j and k? since we only care about the writer's state
In the current scheme, parent counter copies are updated based on local copies of its children.
Think I didn't consider all cases above. To clarify:
Case 2.2: Enable signal of 'i' is currently high only for a single cycle when 'j' saturates, as it will be chained with 'j'. However, if you meant that we can route the token corresponding to Metapipe A and Metapipe B into Pipe E, we have to consider the 'write enable' signal. Specifically, depending on what is being written to sram in Pipe C, it may or may not be okay to use Metapipe A's token to drive 'i'. to E. Short reason: 'write enable' signal.
When a counter saturates, it currently wraps around to the min value (0) and stops. Consider the two cases: 2.2.1:
Pipe C { k => // CU1
sram(i) = i * k
}
Here, the write data depends on 'k' (common case IMO). If the write enable signal is high for as long as Metapipe A is running, because 'k' wraps around on saturation, we may end up writing garbage to sram(i).
Case 2.2.2
Pipe C { k => // CU1
sram(i) = i
}
Here, writes to sram are loop invariant in Pipe C. In such cases, it is fine to not duplicate 'k's counter in E.
Case 3.1: I agree that copies of 'i' and 'j' update at different times in Pipe C and Pipe D. In particular, copies of 'i' and 'j' in Pipe C are incremented before copies in Pipe D.
Specifically, when 'j' saturates in Pipe C, it increments its local 'i', but does not begin execution. When 'j' saturates in Pipe D later, it increments its local copy of 'i', but does not begin execution. At this point, we send a token that begins the execution of Metapipe B. Pipe C and Pipe D begin execution again with the same value of 'i'.
Hypothetically, say we duplicate the two chains (i-j-k and i-j-g) in Pipe E. This would work, no doubt. It's just that when 'j' and 'g' are enabled in E, the value of 'i' in both duplicated chains will be the same always.
Case 3.2: Currently, we would definitely have to duplicate both chains (i-k and i-g) in Pipe E.
Have I missed anything?
1 .
Case 3.1
Specifically, when 'j' saturates in Pipe C, it increments its local 'i', but does not begin execution. When 'j' saturates in Pipe D later, it increments its local copy of 'i', but does not begin execution. At this point, we send a token that begins the execution of Metapipe B. Pipe C and Pipe D begin execution again with the same value of 'i'.
Even though during execution of Pipe C and Pipe D i
would be the same, since write enable is on all the time, won't we write incorrect value when PipeC and PipeD are not sync on i and when not executing if we use one of their copy in Pipe E only?
2 .
I'm a bit lost on Case 2.2.1 and Case 2.2.2. For Pipe C in both cases, it's using parent counter i for local computation, so it would look like k - j.copy - i.copy inside Pipe C in both cases. So one way in Pipe E for remote write is do k.copy - j.copy - i.copy, would this be different than have only i.copy and sync the i.copy in Pipe C directly with i.copy in Pipe E. When I said routing the enable, I meant route whatever that's connected to the enable of i.copy inside PipeC, which is done of j.copy in PipeC to Pipe E directly
CounterChain Copying Rule
Case 1: Copy ancestor CounterChain if it's used in descendant Inner Pipe Example:
In this case A.cchain and B.cchain, are both copied at CU1 from CU2 even though only i is used at PipeD. D.cchain, B.cchain.copy and A.cchain.copy are chained together in order to drive A.cchain.copy at CU1
Case 2: Copy Sram writer's CounterChain for write addr calculation
Case 2.1. If writer's CounterChain is the innermost Pipeline's CounterChain, route enable of the copied CounterChain to reader of the Sram Example:
PipeE needs value of k, so C.cchain is copied at Pipe E and enable of C.cchain is routed to E
Case 2.2. If copied CounterChain is from OuterController O that is sram writer's ancestor, copy the descendant cchain of O in branch that reaches the Innermost writer of the sram Example:
In this case
i
is needed atPipeE
, soC.cchain
,B.cchain
, andA.cchain
are copied atPipeE
.C.cchain.copy
,B.cchain.copy
andA.cchain.copy
are chained together to driveA.cchain.copy
atPipeE
. @raghup17