Open NeoZhangJianyu opened 1 month ago
@NeoZhangJianyu Hi
Which version of the TEI did you use for embedding? Did you implement reranking? If so, was it done on CPU or HPU?
Additionally, could you provide details about all images you used and how you built your ChatQnA pipeline?
I use the docker image built from the dockerfiles in this project. comment id: 3913c7bb3629b2964ff1ddf38a2e4b5359ea43bb
I don't implement reranking. The docker build follow the script/cmd in guide in ChatQnA: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/xeon/README.md
@NeoZhangJianyu
Embedding should work well on GNR. Later we will apply Neural Speed (internal) to speed up embedding serving.
Also, we will investigate this issue.
@NeoZhangJianyu @lvliang-intel
Hello, we’ve evaluated the embedding performance.
Our testing data shows a significant performance gap between TEI 1.2 and TEI 1.5. Please verify which TEI version you are currently using.
In OPEA v0.9, TEI performance should not be an issue.
It's great!
I setup the demo based on ChatQnA (TGI) on Xeon (GNR). Try RAG by the UI. After upload the PDF file (2-5M), I search a question. It will take 10-15s.
When update a text file with 3 lines, it's 2-3s.
Customer find the slow issue on embedding stage.