tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.3k stars 9.41k forks source link

Use Arm Neon equivalent instructions to Intel VNNI #3895

Open amitdo opened 2 years ago

amitdo commented 2 years ago

... for int dot product.

amitdo commented 2 years ago

The equivalent for VNNI VPDPBUSD seems to be USDOT.

https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/USDOT--by-element---Dot-Product-with-unsigned-and-signed-integers--vector--by-element--

There are some variants: SDOT, SUDOT, UDOT.

amitdo commented 2 years ago

CC: @robinwatts

amitdo commented 2 years ago

Arm's USMMLA instruction seems even better.

https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/USMMLA--vector---Unsigned-and-signed-8-bit-integer-matrix-multiply-accumulate--vector--

I wonder if Intel has an equivalent instruction.