Closed richardyrh closed 4 months ago
@richardyrh Can you check whether the performance is not affected by this change?
The code also fixes a bug in mvout from spad address The FP tiled matmul perf seems to be the same
Yes, performance seems to be unaffected (well it's an infinite% speedup since previously the whole thing would just lock up). I do not have perms to merge this, perhaps one of you could do it, or granting me the permissions would be nice.
When the
write_issue_q
dequeue is not valid, the random.bits
may still indicate it is an acc address or may have its garbage bit set. This will erroneously cause the read pipeline output ready to go low, which combinationally causes the SRAM read response interface to deassert ready. As a result, there is a cycle of stalling reading from the SRAMs; however, writeData.valid does not accommodate this stall as it expects a continuous stream of data. This leads to the write queues with one extra data element not dequeued properly every once in a while, which fills up over time and leads to a deadlock. The fix gates the random bits with thewrite_issue_q
output valid.