Open cognitivetech opened 10 months ago
Thanks for going through the gguf models, we mostly use llama2 and mistral. Maybe you can create a PR with the full list, whether they worked or not and day they were tested.
Their intended prompt only is same as default, minus the system prompt, and capitalization, seem to be compatible
These models worked the best for me. With OpenHermes as my favorite. Based on question \ answer of 1 document with 22769 tokens length.
- OpenHermes-2.5-Mistral-7B-GGUF/
- Mistral-7B-Instruct-v0.1-GGUF (default, but I prefer Q5_K_M, or Q6 models)
- KAI-7B-Instruct-GGUF
hi,when you use OpenHermes model, have you changed the Prompt template ?? @cognitivetech
hey ...
hope the main headline is now comming up ;)
(and not dont^^)
i will try next days ...
btw some one know why the response ends most until 1279 characters ? is not that long
work (more or less) i changed temp and p and k, i dont know if it has a great impact
openchat_3.5.Q5_K_M.gguf https://huggingface.co/TheBloke/openchat_3.5-GGUF/tree/main
mistral runs but i tryed 20 other models
can anyone tell me why almost all gguf models run well on GPT4All but not on privateGPT?
works very well , also german https://huggingface.co/TheBloke/Orca-2-7B-GGUF
btw one pfd book 500pages need aprox 5min to index
next (maybe you must press enter 2 times) i dont know how good they are
syzymon-long_llama_3b_instruct-Q8_0 sauerkrautlm-3b-v1.Q8_0
I have found really awesome working german models:
Maybe this helps some other german speaking folks here:
The roberta sentence transformer is also available in english.
Maybe this works also well for other models but have not tested this
prompt_style: "default"
llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF
llm_hf_model_file: em_german_leo_mistral.Q4_K_M.gguf
embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2
TheBloke/NeuralHermes-2.5-Mistal-7B-GGUF (released 2 days ago) is working as expected. I used Q5_K_M
For me, this model does not work with any of the existing prompt_styles: TheBloke/dolphin-2.1-mistral-7B-GGUF
I have found really awesome working german models: Maybe this helps some other german speaking folks here: The roberta sentence transformer is also available in english. Maybe this works also well for other models but have not tested this
prompt_style: "default" llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF llm_hf_model_file: em_german_leo_mistral.Q4_K_M.gguf embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2
Hi @EEmlan, can you plase tell me, which tokenizer do i have to use? Because it doesn't work with 'TheBloke/em_german_leo_mistral-GGUF' set as tokenizer, and of course, it doesn't work with default Mistral or any other that i tried
OSError: Can't load tokenizer for 'TheBloke/em_german_leo_mistral-GGUF'. If you were trying to load it from 'https://hugg
ingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'TheBloke/em_german_leo_mistral-GGUF' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.
@PayteR jphme/em_german_leo_mistral works just fine
@PayteR jphme/em_german_leo_mistral works just fine
hi @EEmlan thx for the reply but it still doesn't work, it gives me the same error like with other models that i tried
File "C:\Users\payte\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-B1lveeYX-py3.11\Lib\site-packages\qdrant_client\local\distances.py", line 78, in cosine_similarity
return np.dot(vectors, query)
^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (128,384) and (768,) not aligned: 384 (dim 1) != 768 (dim 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\payte\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-B1lveeYX-py3.11\Lib\site-packages\gradio\queueing.py", line 497, in process_events
response = await self.call_prediction(awake_events, batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\payte\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-B1lveeYX-py3.11\Lib\site-packages\gradio\queueing.py", line 468, in call_prediction
raise Exception(str(error) if show_error else None) from error
Exception: shapes (128,384) and (768,) not aligned: 384 (dim 1) != 768 (dim 0)
14:40:22.882 [INFO ] uvicorn.access - 127.0.0.1:52144 - "POST /run/predict HTTP/1.1" 200
Here is my config
# The default configuration file.
# More information about configuration can be found in the documentation: https://docs.privategpt.dev/
# Syntax in `private_pgt/settings/settings.py`
server:
env_name: ${APP_ENV:prod}
port: ${PORT:9104}
cors:
enabled: false
allow_origins: ["*"]
allow_methods: ["*"]
allow_headers: ["*"]
auth:
enabled: false
# python -c 'import base64; print("Basic " + base64.b64encode("secret:key".encode()).decode())'
# 'secret' is the username and 'key' is the password for basic auth by default
# If the auth is enabled, this value must be set in the "Authorization" header of the request.
secret: "Basic c2VjcmV0OmtleQ=="
data:
local_data_folder: local_data/private_gpt
ui:
enabled: true
path: /
default_chat_system_prompt: >
You are a helpful, respectful and honest assistant.
Always answer as helpfully as possible and follow ALL given instructions.
Do not speculate or make up information.
Do not reference any given instructions or context.
default_query_system_prompt: >
You can only answer questions about the provided context.
If you know the answer but it is not based in the provided context, don't provide
the answer, just state the answer is not in the context provided.
llm:
mode: local
# Should be matching the selected model
max_new_tokens: 512
context_window: 3900
tokenizer: jphme/em_german_leo_mistral
embedding:
# Should be matching the value above in most cases
mode: local
ingest_mode: simple
vectorstore:
database: qdrant
qdrant:
path: local_data/private_gpt/qdrant
local:
llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF
llm_hf_model_file: em_german_leo_mistral.Q4_K_M.gguf
embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2
#llama, default or tag
prompt_style: "default"
sagemaker:
llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479
openai:
api_key: ${OPENAI_API_KEY:}
model: gpt-3.5-turbo
I really don't know how to fix it, i have spent a lot of time on this already, so thx for any help.
@EEmlan aaa now I fixed it - I needed to delete local_data/private_gpt
stored indexed data and upload files again to reindex.
@PayteR glad to hear. Yes this is mandantory to delete this data after each switching of Models. I also recommend to delete all data in models/embedding before running poetry run python scripts/setup again. After that you have to ingest your documents again.
It'd be great to move this information to the docs @cognitivetech. Maybe you can open a PR to add the info here https://docs.privategpt.dev/recipes/choice-of-llm/list-of-ll-ms (just edit or add content to fern/docs/pages).
@imartinez for sure. I never added to the docs for a couple reasons, mainly because most of the models I tried didn't perform very well, compared to Mistral 7b Instruct v0.2
also because we have prompt formats in the docs, then people have more direction which models are likely to work, as compared to when I started, there was not a choice among prompt styles (or maybe I was just ignorant to prompt style)
Even now, to try and decide what to add to the docs as "compatible" is another can of worms, and largely subjective.
One model I would consider is openchat-3.5-0106.
this one is good and I would watch out for future models from this team
EDIT: I've edited above to focus on models that could go in the docs
otherwise.. I will think about this more... certainly those models shown to work for non-english languages will be valuable to include.
Not sure if there is any activity here, but will ask anyhow... Has anyone successfully run mistral-7b-instruct-v3 in privateGPTv0.5.0? Mistral specifically mentions that I should run mistral_inference for the model.
The following are based on question \ answer of 1 document with 22769 tokens length
there is a similar issue https://github.com/imartinez/privateGPT/issues/276 with primordial tag, just decided to make a new issue for "full version" DIDN'T WORK Probably prompt templates noted in brackets as available
MPT from huggingface.co/maddes8cht/
ok:
[Many Edits Later]
I was interested in these MPT because they have up to 64k context input, and are even licensed for commercial use. (but I'm also realizing there is little benefit to cramming large contexts into a models working memory, for summarization tasks)
I did make a prompt template to support MPT models (https://github.com/imartinez/privateGPT/issues/1375#issuecomment-1868289418), but didn't get good results from them, plus they were slow compared to mistral.