zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://privategpt.dev
Apache License 2.0
53.53k stars 7.19k forks source link

list of working models GGUF #1205

Open cognitivetech opened 10 months ago

cognitivetech commented 10 months ago

The following are based on question \ answer of 1 document with 22769 tokens length

there is a similar issue https://github.com/imartinez/privateGPT/issues/276 with primordial tag, just decided to make a new issue for "full version" DIDN'T WORK Probably prompt templates noted in brackets as available

MPT from huggingface.co/maddes8cht/

ok:

[Many Edits Later]

I was interested in these MPT because they have up to 64k context input, and are even licensed for commercial use. (but I'm also realizing there is little benefit to cramming large contexts into a models working memory, for summarization tasks)

I did make a prompt template to support MPT models (https://github.com/imartinez/privateGPT/issues/1375#issuecomment-1868289418), but didn't get good results from them, plus they were slow compared to mistral.

pabloogc commented 10 months ago

Thanks for going through the gguf models, we mostly use llama2 and mistral. Maybe you can create a PR with the full list, whether they worked or not and day they were tested.

cognitivetech commented 9 months ago

Working Models

Mistral Prompt

Default Prompt

User: Assistant:

Their intended prompt only is same as default, minus the system prompt, and capitalization, seem to be compatible

ChatML

LLAMA2 Prompt

Tag Prompt

shengkaixuan commented 9 months ago

These models worked the best for me. With OpenHermes as my favorite. Based on question \ answer of 1 document with 22769 tokens length.

hi,when you use OpenHermes model, have you changed the Prompt template ?? @cognitivetech

kalle07 commented 9 months ago

hey ...

hope the main headline is now comming up ;)

list of working models GGUF !!!

(and not dont^^)

i will try next days ...

btw some one know why the response ends most until 1279 characters ? is not that long

kalle07 commented 9 months ago

work (more or less) i changed temp and p and k, i dont know if it has a great impact

openchat_3.5.Q5_K_M.gguf https://huggingface.co/TheBloke/openchat_3.5-GGUF/tree/main

mistral runs but i tryed 20 other models

can anyone tell me why almost all gguf models run well on GPT4All but not on privateGPT?

kalle07 commented 9 months ago

works very well , also german https://huggingface.co/TheBloke/Orca-2-7B-GGUF

btw one pfd book 500pages need aprox 5min to index

kalle07 commented 9 months ago

next (maybe you must press enter 2 times) i dont know how good they are

syzymon-long_llama_3b_instruct-Q8_0 sauerkrautlm-3b-v1.Q8_0

EEmlan commented 9 months ago

I have found really awesome working german models: Maybe this helps some other german speaking folks here: The roberta sentence transformer is also available in english.
Maybe this works also well for other models but have not tested this

  prompt_style: "default"
  llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF
  llm_hf_model_file:   em_german_leo_mistral.Q4_K_M.gguf
  embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2
writinguaway commented 9 months ago

TheBloke/NeuralHermes-2.5-Mistal-7B-GGUF (released 2 days ago) is working as expected. I used Q5_K_M

therohitdas commented 9 months ago

For me, this model does not work with any of the existing prompt_styles: TheBloke/dolphin-2.1-mistral-7B-GGUF

PayteR commented 7 months ago

I have found really awesome working german models: Maybe this helps some other german speaking folks here: The roberta sentence transformer is also available in english. Maybe this works also well for other models but have not tested this

  prompt_style: "default"
  llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF
  llm_hf_model_file:   em_german_leo_mistral.Q4_K_M.gguf
  embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2

Hi @EEmlan, can you plase tell me, which tokenizer do i have to use? Because it doesn't work with 'TheBloke/em_german_leo_mistral-GGUF' set as tokenizer, and of course, it doesn't work with default Mistral or any other that i tried

OSError: Can't load tokenizer for 'TheBloke/em_german_leo_mistral-GGUF'. If you were trying to load it from 'https://hugg
ingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'TheBloke/em_german_leo_mistral-GGUF' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.
EEmlan commented 7 months ago

@PayteR jphme/em_german_leo_mistral works just fine

PayteR commented 7 months ago

@PayteR jphme/em_german_leo_mistral works just fine

hi @EEmlan thx for the reply but it still doesn't work, it gives me the same error like with other models that i tried

  File "C:\Users\payte\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-B1lveeYX-py3.11\Lib\site-packages\qdrant_client\local\distances.py", line 78, in cosine_similarity
    return np.dot(vectors, query)
           ^^^^^^^^^^^^^^^^^^^^^^
ValueError: shapes (128,384) and (768,) not aligned: 384 (dim 1) != 768 (dim 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\payte\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-B1lveeYX-py3.11\Lib\site-packages\gradio\queueing.py", line 497, in process_events
    response = await self.call_prediction(awake_events, batch)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\payte\AppData\Local\pypoetry\Cache\virtualenvs\private-gpt-B1lveeYX-py3.11\Lib\site-packages\gradio\queueing.py", line 468, in call_prediction
    raise Exception(str(error) if show_error else None) from error
Exception: shapes (128,384) and (768,) not aligned: 384 (dim 1) != 768 (dim 0)
14:40:22.882 [INFO    ]            uvicorn.access - 127.0.0.1:52144 - "POST /run/predict HTTP/1.1" 200

Here is my config

# The default configuration file.
# More information about configuration can be found in the documentation: https://docs.privategpt.dev/
# Syntax in `private_pgt/settings/settings.py`
server:
  env_name: ${APP_ENV:prod}
  port: ${PORT:9104}
  cors:
    enabled: false
    allow_origins: ["*"]
    allow_methods: ["*"]
    allow_headers: ["*"]
  auth:
    enabled: false
    # python -c 'import base64; print("Basic " + base64.b64encode("secret:key".encode()).decode())'
    # 'secret' is the username and 'key' is the password for basic auth by default
    # If the auth is enabled, this value must be set in the "Authorization" header of the request.
    secret: "Basic c2VjcmV0OmtleQ=="

data:
  local_data_folder: local_data/private_gpt

ui:
  enabled: true
  path: /
  default_chat_system_prompt: >
    You are a helpful, respectful and honest assistant. 
    Always answer as helpfully as possible and follow ALL given instructions.
    Do not speculate or make up information.
    Do not reference any given instructions or context.
  default_query_system_prompt: >
    You can only answer questions about the provided context. 
    If you know the answer but it is not based in the provided context, don't provide 
    the answer, just state the answer is not in the context provided.

llm:
  mode: local
  # Should be matching the selected model
  max_new_tokens: 512
  context_window: 3900
  tokenizer: jphme/em_german_leo_mistral

embedding:
  # Should be matching the value above in most cases
  mode: local
  ingest_mode: simple

vectorstore:
  database: qdrant

qdrant:
  path: local_data/private_gpt/qdrant

local:
  llm_hf_repo_id: TheBloke/em_german_leo_mistral-GGUF
  llm_hf_model_file:   em_german_leo_mistral.Q4_K_M.gguf
  embedding_hf_model_name: T-Systems-onsite/german-roberta-sentence-transformer-v2
  #llama, default or tag
  prompt_style: "default"

sagemaker:
  llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
  embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479

openai:
  api_key: ${OPENAI_API_KEY:}
  model: gpt-3.5-turbo

I really don't know how to fix it, i have spent a lot of time on this already, so thx for any help.

PayteR commented 7 months ago

@EEmlan aaa now I fixed it - I needed to delete local_data/private_gpt stored indexed data and upload files again to reindex.

EEmlan commented 7 months ago

@PayteR glad to hear. Yes this is mandantory to delete this data after each switching of Models. I also recommend to delete all data in models/embedding before running poetry run python scripts/setup again. After that you have to ingest your documents again.

imartinez commented 7 months ago

It'd be great to move this information to the docs @cognitivetech. Maybe you can open a PR to add the info here https://docs.privategpt.dev/recipes/choice-of-llm/list-of-ll-ms (just edit or add content to fern/docs/pages).

cognitivetech commented 6 months ago

@imartinez for sure. I never added to the docs for a couple reasons, mainly because most of the models I tried didn't perform very well, compared to Mistral 7b Instruct v0.2

also because we have prompt formats in the docs, then people have more direction which models are likely to work, as compared to when I started, there was not a choice among prompt styles (or maybe I was just ignorant to prompt style)

Even now, to try and decide what to add to the docs as "compatible" is another can of worms, and largely subjective.

One model I would consider is openchat-3.5-0106.

this one is good and I would watch out for future models from this team

EDIT: I've edited above to focus on models that could go in the docs

otherwise.. I will think about this more... certainly those models shown to work for non-english languages will be valuable to include.

CMiller56 commented 2 months ago

Not sure if there is any activity here, but will ask anyhow... Has anyone successfully run mistral-7b-instruct-v3 in privateGPTv0.5.0? Mistral specifically mentions that I should run mistral_inference for the model.