This change fixes a bug where matmuls and accs would wait on each other instead of proceeding in parallel. The problem was that the control queues feeding them were implicitly linked, meaning that one could not get too far ahead of the other. We add buffers in these places to cut dependencies between buses fed by the same control inputs. In this case the buffer size was sometimes set to 2x the array size to account for the fact that accumulate operations can't begin until data has propagated through the array.
This change fixes a bug where matmuls and accs would wait on each other instead of proceeding in parallel. The problem was that the control queues feeding them were implicitly linked, meaning that one could not get too far ahead of the other. We add buffers in these places to cut dependencies between buses fed by the same control inputs. In this case the buffer size was sometimes set to 2x the array size to account for the fact that accumulate operations can't begin until data has propagated through the array.