Hello, I have a question http://0.0.0.0:8001 the first answer (QueryFiles) is generated based on the added document, it is quite correct, then when asking the next question the answer is empty, although the terminal shows the answer, after refreshing the page and asking the question again, the answer is generated correctly but only once, you need to refresh the page again www (chrome-ubuntu22)

firefox the same

apart from that, the whole PROJECT works properly cuda12 1080TI

I'd love to know your Configurations

Polish Working Config

================================== create settings-polska.yaml PGPT_PROFILES=polska ./scripts/setup PGPT_PROFILES=polska python3.11 -m private_gpt

===========config================= local: llm_hf_repo_id: TheBloke/zephyr-7B-beta-pl-GGUF llm_hf_model_file: zephyr-7b-beta-pl.Q4_0.gguf embedding_hf_model_name: radlab/polish-gpt2-small-v2

llama, default or tag

prompt_style: "default"

llm: mode: llamacpp

Should be matching the selected model

max_new_tokens: 512 context_window: 4000 tokenizer: radlab/polish-sts-v2 temperature: 0.1

rag: similarity_top_k: 2

This value controls how many "top" documents the RAG returns to use in the context.

similarity_value: 0.45

This value is disabled by default. If you enable this settings, the RAG will only use articles that meet a certain percentage score.

rerank: enabled: false model: cross-encoder/ms-marco-MiniLM-L-2-v2 top_n: 1

llamacpp:

prompt_style: "mistral" llm_hf_repo_id: TheBloke/zephyr-7B-beta-pl-GGUF llm_hf_model_file: zephyr-7b-beta-pl.Q4_0.gguf tfs_z: 1.0 # # Próbkowanie bez ogona jest używane w celu zmniejszenia wpływu mniej prawdopodobnych tokenów na wyjściu. Wyższa wartość (np.> top_k: 60 # # Zmniejsza prawdopodobieństwo wygenerowania nonsensu. Wyższa wartość (np. 100) zapewni bardziej zróżnicowane odpowiedzi, nat> top_p: 0.9 # # Działa razem z top-k. Wyższa wartość (np. 0,95) spowoduje powstanie bardziej zróżnicowanego tekstu, natomiast niższa wartoś> repeat_penalty: 1.5 # # Ustawia siłę karania za powtórzenia. Wyższa wartość (np. 1,5) będzie bardziej karać za powtórzenia, podczas gdy niższa wart>

embedding:

Should be matching the value above in most cases

mode: huggingface ingest_mode: simple embed_dim: 384 # 384 is for sdadas/mmlw-e5-small

huggingface: embedding_hf_model_name: sdadas/mmlw-e5-small

vectorstore: database: qdrant

nodestore: database: simple

qdrant: path: local_data/private_gpt/qdrant

=============================================PL===========================================

zylon-ai / private-gpt

no answer, you need to refresh the website #1837