openpower-cores / a2i

Other
244 stars 37 forks source link

Unbalanced instruction fetch latency and instruction fetch width #36

Closed zhaoxiahust closed 3 years ago

zhaoxiahust commented 3 years ago

Dear all,

I noticed that there are 5-cycle delay, i.e., IU0, IU1, IU2, IU3 and IU4, between the thread selection and putting the instruction into the instruction buffer so the instruction can be issued. Since we can only fetch 4 instructions per cycle in the idea case, are there any unbalanced cases here? In particular, if each thread can issue one instruction per cycle in the ideal case, I think the current instruction fetch unit will cause the thread starvation, i.e., a thread cannot issue instructions because its instruction buffer is empty. I know this will not happen in A2I since 4 threads need to compete for the 2 execution units here. Just an architectural design discussion:)

Cheers,

zhaoxiahust commented 3 years ago

I think this also causes the problem when there is only one thread running on A2I. I already verified this through simulation. There are many bubbles in the instruction issue. It is like "issue, issue, issue, issue, issue, issue, issue, issue, bubble, issue, issue....". Any comments?

openpowerwtf commented 3 years ago

Seems odd. I would think that the IBuffer would be reloaded early enough to keep up with issue.

zhaoxiahust commented 3 years ago

From one thread view, there exist bubbles. Fortunately, from A2 view, I think this will not cause any problems since there are only two execution units here. From the execution unit view, they can always receive instructions from one thread, and the execution unit utilization is 100%.

zhaoxiahust commented 3 years ago

One instruction fetch can bring back 4 instructions. IBuffer only has 8 entries so the thread selection algorithm is carefully designed to do early reload while avoiding overflow. Once the thread is selected to issue an instruction fetch request, it has to wait after 5 cycles to fetch instructions again in some cases, which causes one bubble after 8 issues.

openpowerwtf commented 3 years ago

Seems like it should always be prefetching until the ops are taken, especially since it knows how many threads are enabled. Or, holding the latched read data from the array until it can be taken by IB, or is overwritten/invalidated by some other real demand request.

zhaoxiahust commented 3 years ago

Yes, there are many solutions to handle this. Since this is not a problem with the current A2 from the system view when four threads are running together, I guess that's why the designers did not introduce extra designs.