Closed manodeep closed 1 year ago
Cool! Will take a look. How's the performance compared to fallback?
It's a bit of a mixed bag. In single-precision - 3.6s with NEON
and 5.8s with FALLBACK
, while the times are about the same in double-precision.
Realised that we have run out of Travis CI build minutes and no tests have run on TravisCI for nearly a year now. Hence, the arm64
tests will have to be done on some other CI -> currently trying out CircleCI
I am trying out a bunch of new optimisations along side this ARM64 implementation - might take a while to finalise. Will re-open once ready
Both
rpavg
andweightavg
are wrong and the tests fail locally. The good news is that the number of pairs matchAdded a checklist for the PR