Open thomasahle opened 1 year ago
There's also the simple idea of using shuffle 512 in four lanes instead of two with AVX and 1 with SSE: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_shuffle_epi8&ig_expand=6664,6629,6629
It should probably look something like this: https://gist.github.com/thomasahle/dad66753ffecda62f86b6e6eaf0ec8e5
AVX-512 has some nice features, such as support for fast float16 operations. This might allow us to do rescoring very fast. The Quicker ADC paper also mentions some uses of AVX-512: https://arxiv.org/pdf/1812.09162.pdf such as {5,6,7} bit lookup tables. Though I don't think any of the top libraries, like ScaNN or Faiss actually uses that.