Closed tdb-alcorn closed 2 years ago
There is Verilator compilation issue with ChiselTest 0.5.3 preventing proper testing with simulation. We reached to Chisel Gitter for help. In absence of simulation we tested on FPGA and confirmed correctness.
On FPGA we observed significant performance improvement (see line D):
ResNet20 CIFAR ONNX | YOLOv4 Tiny @416 ONNX | ResNet50 ImageNet ONNX | |
---|---|---|---|
A. ZCU104 baseline (ms) | 13.099 | 294.071 | 514.279 |
B. A and external AXI4S Width Converter instead of Transmisson | 11.168 | 262.804 | 473.956 |
C. B and two-cycle read/write fix | 9.661 | 214.892 | 415.953 |
D. C and fix for delay between ops | 7.387 | 120.748 | 256.987 |
This change fixes a bug where matmuls and accs would wait on each other instead of proceeding in parallel. The problem was that the control queues feeding them were implicitly linked, meaning that one could not get too far ahead of the other. We add buffers in these places to cut dependencies between buses fed by the same control inputs. In this case the buffer size was sometimes set to 2x the array size to account for the fact that accumulate operations can't begin until data has propagated through the array.
To prevent concurrency bugs from arising due to out of order execution, we also add a lock pool module that manages a set of mutexes that are shared by a number of actors. In our case, the mutexes are locks tied to memory regions (2 locks total) and the actors are the memory ports. The decoder acquires locks when it sends off memory requests, and the locks are automatically released when the appropriate condition is met - in this case when the memory request that needs to lock is actually passed into the memory module.
This change also required upgrading to Chisel 3.5.3 to get around some odd bugs that seemed to be due to the
cloneType
methods, which are no longer necessary in this version. To effect the upgrade, we had to fix some deprecation and naming issues, such as removing unnecessary parentheses from.fire
,.isLit
, or moving the verification statements out of the experimental package. We also had to update tests to reflect the new API of scalatest in the latest version (cosmetic differences only).The meat of the change is in the
LockPool
module so please start there when reviewing.Closing #52 in favour of this new PR
Closes #15