Closed tdb-alcorn closed 2 years ago
On FPGA:
ResNet20 CIFAR ONNX | YOLOv4 Tiny @416 ONNX | ResNet50 ImageNet ONNX | |
---|---|---|---|
A. ZCU104 baseline (ms) | 13.099 | 294.071 | 514.279 |
B. A and external AXI4S Width Converter instead of Transmisson | 11.168 | 262.804 | 473.956 |
C. B and two-cycle read/write fix | 9.661 | 214.892 | 415.953 |
This change adds transparent queues to all the control queues in the accumulator modules to unblock control flow. When a control queue needs to address more than one subordinate before proceeding, we use a multi enqueue to handle this transaction. This works well when all the subordinates are engaged at the same stage in the data pipeline, but when one subordinate operates at a later stage, it can cause earlier subordinates to have to wait for it unnecessarily. By adding tiny transparent queues to all the inputs to a multi enqueue, we cut this dependency because the later subordinates control flow can simply buffer up. In the future, when we add a multi enqueue that addresses subordinates that might be separated by up to N cycles in the pipeline, we should add a transparent queue of N elements to each control input.
This change also adds a helpful command to the readme and tidies up a few comment messes.