tensil-ai / tensil

Open source machine learning accelerators
https://www.tensil.ai
Other
348 stars 28 forks source link

Add width converter #66

Closed tdb-alcorn closed 2 years ago

tdb-alcorn commented 2 years ago

This change implements a WidthConverter which can handle widths that are not strict multiple of each other. To achieve this we divide up data into blocks and then implement a double pointer queue with special handling for irregular offsets.

petrohi commented 2 years ago

Everything checks out on FPGA. Also AXI Width Converter performance is matched almost exactly. LGTM!

  ResNet20 CIFAR ONNX YOLOv4 Tiny @416 ONNX ResNet50 ImageNet ONNX
A. ZCU104 baseline (ms) 13.099 294.071 514.279
B. A and external AXI4S Width Converter instead of Transmisson 11.168 262.804 473.956
C. B and two-cycle read/write fix 9.661 214.892 415.953
D. C and fix for delay between ops 7.387 120.748 256.987
E. D with WidthConverter instead of AXI4S Width Converter 7.388 120.752 256.993