Closed tdb-alcorn closed 2 years ago
Everything checks out on FPGA. Also AXI Width Converter performance is matched almost exactly. LGTM!
ResNet20 CIFAR ONNX | YOLOv4 Tiny @416 ONNX | ResNet50 ImageNet ONNX | |
---|---|---|---|
A. ZCU104 baseline (ms) | 13.099 | 294.071 | 514.279 |
B. A and external AXI4S Width Converter instead of Transmisson | 11.168 | 262.804 | 473.956 |
C. B and two-cycle read/write fix | 9.661 | 214.892 | 415.953 |
D. C and fix for delay between ops | 7.387 | 120.748 | 256.987 |
E. D with WidthConverter instead of AXI4S Width Converter | 7.388 | 120.752 | 256.993 |
This change implements a WidthConverter which can handle widths that are not strict multiple of each other. To achieve this we divide up data into blocks and then implement a double pointer queue with special handling for irregular offsets.