openppl-public / ppl.cv

ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
Apache License 2.0
484 stars 108 forks source link

Optimize aarch64 rotate #133

Closed VOIDMalkuth closed 6 months ago

VOIDMalkuth commented 6 months ago

Reduce NEON sequence.

On Kyro260 (Cortex-A53)

Before

BM_Rotate_ppl_aarch64<float, c3, 90>/640/480/iterations:10/manual_time           4917958 ns    236724000 ns           10 items_per_second=203.336/s
BM_Rotate_ppl_aarch64<float, c3, 90>/1920/1080/iterations:10/manual_time        32919244 ns   1581717344 ns           10 items_per_second=30.3774/s
BM_Rotate_ppl_aarch64<uint8_t, c3, 90>/640/480/iterations:10/manual_time         2124198 ns    101570155 ns           10 items_per_second=470.766/s
BM_Rotate_ppl_aarch64<uint8_t, c3, 90>/1920/1080/iterations:10/manual_time      13242732 ns    638636764 ns           10 items_per_second=75.5131/s

After

BM_Rotate_ppl_aarch64<float, c3, 90>/640/480/iterations:10/manual_time           4576840 ns    218713006 ns           10 items_per_second=218.491/s
BM_Rotate_ppl_aarch64<float, c3, 90>/1920/1080/iterations:10/manual_time        27311788 ns   1319167490 ns           10 items_per_second=36.6142/s
BM_Rotate_ppl_aarch64<uint8_t, c3, 90>/640/480/iterations:10/manual_time         2052890 ns     98462090 ns           10 items_per_second=487.118/s
BM_Rotate_ppl_aarch64<uint8_t, c3, 90>/1920/1080/iterations:10/manual_time      13082652 ns    629270526 ns           10 items_per_second=76.4371/s