opea-project / GenAIComps

GenAI components at micro-service level; GenAI service composer to create mega-service
Apache License 2.0
61 stars 129 forks source link

Feature request add AMX instructions for FAISS, in the retrievers #307

Open endomorphosis opened 3 months ago

endomorphosis commented 3 months ago

https://github.com/facebookresearch/faiss/pull/3266

IndexFlatIP search performance accelerated by oneDNN/AMX improves by 1.7X to 5X compared to the default inner_product, In scenarios with 1 query, dimensions ranging from 64 to 1024, and 1,000,000 vectors.

IndexFlatIP search performance accelerated by oneDNN/AMX improves by up to 4X compared to the Blas inner_product, In scenarios with 1000 query, dimensions ranging from 64 to 1024, and 1,000,000 vectors.

eero-t commented 3 months ago

According to Wikipedia, AMX is available only on SaphireRapids and newer: https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions

See also BFLOAT16 ticket: https://github.com/opea-project/GenAIExamples/issues/330

NeoZhangJianyu commented 3 months ago

@endomorphosis Thank your feedback! This requirement need FAISS support. When FAISS is updated with the PR, OPEA will include it as soon!

NeoZhangJianyu commented 3 months ago

This requirement need FAISS project handle. We can't help for this issue.

Could we close this issue?

endomorphosis commented 3 months ago

I am just requesting that when the FAISS project accepts the pull request, that the opea implementation support AMX, I am not purporting to ask OPEA to solve the memory leak issue in the AMX implementation

NeoZhangJianyu commented 3 months ago

We have created a feature request to OPEA to include FAISS.

Thank you!

NeoZhangJianyu commented 2 months ago

Got it! Could you close this issue?

endomorphosis commented 2 months ago

Thank you for updating me on this topic.

I see that the PR author updated the issue yesterday. I am working with a college student to build a FAISS index of all the wikipedia abstracts / summaries for a GenAI example, and I intend to build a very large embedding database of all the caselaw, and I intend to publish the benchmarks for the retrieval on these large retrieval datasets.

I believe that 4x speedups are significant enough that i may not need to shard the index while keeping retrieval service latency short enough that it does not impact UX.

NeoZhangJianyu commented 1 month ago

I see the PR to support AMX in FAISS: https://github.com/facebookresearch/faiss/pull/3266 It's pending now. Hope it's merged as soon!

NeoZhangJianyu commented 2 weeks ago

@endomorphosis Could we close this issue since the PR of FAISS is opened for a long time.

endomorphosis commented 1 week ago

https://github.com/endomorphosis/laion-embeddings

I am working on a package which will create embeddings for any huggingface dataset, and setup a search engine (faiss / qdrant / elasticsearch), given some arbitrary list of models and some arbitrary numbers of embeddings endpoings, and the primary key for which will be the ipfs hash such that the row can be retrieved from the IPFS / filecoin network.

I intend to send to the linux foundation / OPEA group for inclusion into this project, it is not yet ready for inclusion, but right now I am clustering ~6 million rows of embeddings with 1536 dimensions with FAISS and the time per iteration is 2768s / iteration for 4096 k means clusters on a Xeon E5-2690V4 dual socket system. At this rate it will take 32 hours for 300 iterations to cluster them, (375s * 300 / 3600 / 24), thus why having AMX instructions that could reduce this by 5x is important.

endomorphosis commented 1 week ago

The status of the FAISS PR, was that there was a request to rebase the AMX branch, and some refactoring of some of the methods https://github.com/facebookresearch/faiss/pull/3266#pullrequestreview-2338153492

I believe that those were done in this recent commit: https://github.com/facebookresearch/faiss/pull/3266/commits/9e3432369ac0d665fde65cb23038217a2c70d931