Open endomorphosis opened 3 months ago
According to Wikipedia, AMX is available only on SaphireRapids and newer: https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions
See also BFLOAT16 ticket: https://github.com/opea-project/GenAIExamples/issues/330
@endomorphosis Thank your feedback! This requirement need FAISS support. When FAISS is updated with the PR, OPEA will include it as soon!
This requirement need FAISS project handle. We can't help for this issue.
Could we close this issue?
I am just requesting that when the FAISS project accepts the pull request, that the opea implementation support AMX, I am not purporting to ask OPEA to solve the memory leak issue in the AMX implementation
We have created a feature request to OPEA to include FAISS.
Thank you!
Got it! Could you close this issue?
Thank you for updating me on this topic.
I see that the PR author updated the issue yesterday. I am working with a college student to build a FAISS index of all the wikipedia abstracts / summaries for a GenAI example, and I intend to build a very large embedding database of all the caselaw, and I intend to publish the benchmarks for the retrieval on these large retrieval datasets.
I believe that 4x speedups are significant enough that i may not need to shard the index while keeping retrieval service latency short enough that it does not impact UX.
I see the PR to support AMX in FAISS: https://github.com/facebookresearch/faiss/pull/3266 It's pending now. Hope it's merged as soon!
@endomorphosis Could we close this issue since the PR of FAISS is opened for a long time.
https://github.com/endomorphosis/laion-embeddings
I am working on a package which will create embeddings for any huggingface dataset, and setup a search engine (faiss / qdrant / elasticsearch), given some arbitrary list of models and some arbitrary numbers of embeddings endpoings, and the primary key for which will be the ipfs hash such that the row can be retrieved from the IPFS / filecoin network.
I intend to send to the linux foundation / OPEA group for inclusion into this project, it is not yet ready for inclusion, but right now I am clustering ~6 million rows of embeddings with 1536 dimensions with FAISS and the time per iteration is 2768s / iteration for 4096 k means clusters on a Xeon E5-2690V4 dual socket system. At this rate it will take 32 hours for 300 iterations to cluster them, (375s * 300 / 3600 / 24), thus why having AMX instructions that could reduce this by 5x is important.
The status of the FAISS PR, was that there was a request to rebase the AMX branch, and some refactoring of some of the methods https://github.com/facebookresearch/faiss/pull/3266#pullrequestreview-2338153492
I believe that those were done in this recent commit: https://github.com/facebookresearch/faiss/pull/3266/commits/9e3432369ac0d665fde65cb23038217a2c70d931
https://github.com/facebookresearch/faiss/pull/3266
IndexFlatIP search performance accelerated by oneDNN/AMX improves by 1.7X to 5X compared to the default inner_product, In scenarios with 1 query, dimensions ranging from 64 to 1024, and 1,000,000 vectors.
IndexFlatIP search performance accelerated by oneDNN/AMX improves by up to 4X compared to the Blas inner_product, In scenarios with 1000 query, dimensions ranging from 64 to 1024, and 1,000,000 vectors.