marella / chatdocs

Chat with your documents offline using AI.
MIT License
707 stars 100 forks source link

Delay in get_retrieval_qa ? #63

Closed drvenabili closed 9 months ago

drvenabili commented 1 year ago

Hi,

First of all, many thanks for this fantastic lib.

I'm encountering some delay in an answer being formulated. This happens with a custom GGML model. The text generation is blazing fast (both with llama.cpp and through ctransformers, especially with layers offloaded to GPU) which makes me think that it's not the model's fault.

With chatdocs I have an ~8s delay when chatting. Is this the price of the QA function? Or is there something wrong with the way I calculate embeddings to search against the vector store?

Many thanks

Hardware:

Ananderz commented 1 year ago

First time the model loads (first question) - I also get the delay. Then it answers in milliseconds, it's blazing fast!