Delay in get_retrieval_qa ?

Hi,

First of all, many thanks for this fantastic lib.

I'm encountering some delay in an answer being formulated. This happens with a custom GGML model. The text generation is blazing fast (both with llama.cpp and through ctransformers, especially with layers offloaded to GPU) which makes me think that it's not the model's fault.

With chatdocs I have an ~8s delay when chatting. Is this the price of the QA function? Or is there something wrong with the way I calculate embeddings to search against the vector store?

Many thanks

Hardware:

AMD EPYC 7R32 (only 16 cores of those)
Nvidia A10G (24 GB of VRAM)
nvme

marella / chatdocs

Delay in get_retrieval_qa ? #63