Open xxxxxxLD opened 2 years ago
Hi, @xxxxxxLD My understanding is vld1 instruction is arm v7 aarch64, while I've implemented arm v7 original version only. Anyway, there are no fast way to implement it in x86 - you are free to use your own implementation - like calling vld1_u8 three times correspondingly.
In original, vld1_u8_x3(Load multiple single-element structures to one, two, three, or four registers) are available, and it seems that in arm v7 also available.
see at https://developer.arm.com/architectures/instruction-sets/intrinsics/
but in NEON_2_SSE.h, it is not available. NEON_2_SSE.h supports which version of arm processor?