wolfpld / etcpak

The fastest ETC compressor on the planet
Other
229 stars 41 forks source link

Add NEON version of heuristic selector #20

Closed m1awicki closed 3 years ago

m1awicki commented 3 years ago

Description

Three functions (hMin(), hMax() and CalculateLuma()) were implemented with use of NEON intrinsics in order to improve performance of the computations.

Results from odroid-n2 (Mpx/s):

  ST MT HST HMT
ETC1 45.098 201.383 45.080 200.505
ETC2 41.162 179.972 60.623 318.686
ETC2 RGBA 26.102 140.651 33.674 172.449

ST - single thread
MT - multi thread
H - heuristic

General diff between images compressed using new and old compression mode selection mechanism (results obtained on ARM based hardware):

diff_noheufix_heukodim23 png ktx

diff_noheufix_heukodim03 png ktx

diff_noheufix_heukodim05 png ktx

diff_noheufix_heukodim20 png ktx

Detailed comparison of image kodim23.png. Image on the left (new) was compressed using new compression mode selector implemented with NEON intrinsics, while image on the right (old) was compressed with heuristic selector disabled (--disable-heuristics):

cmp_arm_neon