xboot / libonnx

A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.
MIT License
586 stars 108 forks source link

[Conv] im2col optimization #4

Closed ReinForce-II closed 4 years ago

ReinForce-II commented 4 years ago

benchmark: super_resolution_10

5900X: before: Constant-10 default 10 0.200(us) Conv-10 default 20 692877.650(us) Relu-10 default 15 795.733(us) Reshape-10 default 10 129.200(us) Transpose-10 default 5 6785.200(us)

after: Constant-10 default 20 0.250(us) Conv-10 default 40 81711.875(us) Relu-10 default 30 1220.067(us) Reshape-10 default 20 126.400(us) Transpose-10 default 10 6212.700(us)

M1: before: Constant-10 default 10 0.000(us) Conv-10 default 20 640919.200(us) Relu-10 default 15 368.800(us) Reshape-10 default 10 37.200(us) Transpose-10 default 5 3711.400(us)

after: Constant-10 default 20 0.000(us) Conv-10 default 40 158875.825(us) Relu-10 default 30 406.733(us) Reshape-10 default 20 35.750(us) Transpose-10 default 10 3768.100(us)

RK3399: A72: before: Constant-10 default 2 1.000(us) Conv-10 default 4 2608700.000(us) Relu-10 default 3 2887.333(us) Reshape-10 default 2 525.000(us) Transpose-10 default 1 18725.000(us)

after: Constant-10 default 10 0.700(us) Conv-10 default 20 515708.400(us) Relu-10 default 15 2934.600(us) Reshape-10 default 10 635.300(us) Transpose-10 default 5 18132.400(us)

A53: before: Constant-10 default 2 0.500(us) Conv-10 default 4 7815257.500(us) Relu-10 default 3 6194.667(us) Reshape-10 default 2 964.500(us) Transpose-10 default 1 47420.000(us)

after: Constant-10 default 2 1.000(us) Conv-10 default 4 1974511.750(us) Relu-10 default 3 6270.667(us) Reshape-10 default 2 977.500(us) Transpose-10 default 1 45606.000(us)