Is your feature request related to a problem? Please describe.
SIMD acceleration is implemented for x86 and tracked in #21 for 32-bit ARM. We also need support for 64-bit ARM.
Describe the solution you'd like
We expect most of the design around extracting architecture-specific bits to be done in #14. After that, a similar approach can be used here and in #21.
Is your feature request related to a problem? Please describe. SIMD acceleration is implemented for x86 and tracked in #21 for 32-bit ARM. We also need support for 64-bit ARM.
Describe the solution you'd like We expect most of the design around extracting architecture-specific bits to be done in #14. After that, a similar approach can be used here and in #21.
Additional context Find NEON intrinsics documentation here.
I am not knowledgable in NEON and I don't even know how to emulate an ARM system locally, so help here is really needed.