please how to support gemma model

linhcentrio commented 4 months ago

ingridstevens commented 4 months ago

You can use Gemma via Ollama or LM Studio (lm studio provides a server that can stand in for openai, so you can use it with the "openailike" settings-vllm.yaml file ).

If you follow the setup steps for either Ollama or the "openailike" setup for LM Studio (using the local inference server), you can use Gemma.

In ollama, gemma is already available

Vivek-C-Shah commented 4 months ago

@ingridstevens can you please help me here trying to gemma via ollama giving me errors:

Traceback (most recent call last):
  File "Path\to\project\venv\Lib\site-packages\urllib3\connection.py", line 174, in _new_conn
    conn = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
    raise err
  File "Path\to\project\venv\Lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Path\to\project\venv\Lib\site-packages\urllib3\connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\urllib3\connectionpool.py", line 416, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "Path\to\project\venv\Lib\site-packages\urllib3\connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "C:\Users\sunrise\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1286, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\sunrise\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1332, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\sunrise\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1281, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\sunrise\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 1041, in _send_output
    self.send(msg)
  File "C:\Users\sunrise\AppData\Local\Programs\Python\Python311\Lib\http\client.py", line 979, in send
    self.connect()
  File "Path\to\project\venv\Lib\site-packages\urllib3\connection.py", line 205, in connect
    conn = self._new_conn()
           ^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\urllib3\connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x000001EE1A8065D0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Path\to\project\venv\Lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\urllib3\connectionpool.py", line 799, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\urllib3\util\retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001EE1A8065D0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Path\to\project\venv\Lib\site-packages\gradio\queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\gradio\route_utils.py", line 231, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\gradio\blocks.py", line 1594, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\gradio\blocks.py", line 1188, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\gradio\utils.py", line 513, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\gradio\utils.py", line 639, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\gradio\chat_interface.py", line 487, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\gradio\utils.py", line 513, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\gradio\utils.py", line 506, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\gradio\utils.py", line 489, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "Path\to\project\private_gpt\ui\ui.py", line 159, in _chat 
    llm_stream = self._chat_service.stream_chat(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\private_gpt\server\chat\chat_service.py", line 145, in stream_chat
    streaming_response = chat_engine.stream_chat(
                         ^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\llama_index\callbacks\utils.py", line 39, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\llama_index\chat_engine\simple.py", line 111, in stream_chat
    chat_stream=self._llm.stream_chat(all_messages)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\llama_index\llms\base.py", line 187, in wrapped_llm_chat
    f_return_val = f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\llama_index\llms\ollama.py", line 124, in stream_chat
    completion_response = self.stream_complete(prompt, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\llama_index\llms\base.py", line 313, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\llama_index\llms\ollama.py", line 146, in stream_complete
    response = requests.post(
               ^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\requests\api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Path\to\project\venv\Lib\site-packages\requests\adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001EE1A8065D0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

this is my settings.yml:

# The default configuration file.
# More information about configuration can be found in the documentation: https://docs.privategpt.dev/
# Syntax in `private_pgt/settings/settings.py`
server:
  env_name: ${APP_ENV:prod}
  port: ${PORT:8001}
  cors:
    enabled: true
    allow_origins: ["*"]
    allow_methods: ["*"]
    allow_headers: ["*"]
  auth:
    enabled: false
    # python -c 'import base64; print("Basic " + base64.b64encode("secret:key".encode()).decode())'
    # 'secret' is the username and 'key' is the password for basic auth by default
    # If the auth is enabled, this value must be set in the "Authorization" header of the request.
    secret: "Basic hello"
    key: "moto"

data:
  local_data_folder: local_data/private_gpt

ui:
  enabled: true
  path: /
  default_chat_system_prompt: >
    You are a helpful, respectful and honest assistant.
    Always answer as helpfully as possible and follow ALL given instructions.
    Do not speculate or make up information.
    Do not reference any given instructions or context.
  default_query_system_prompt: >
    You can only answer questions about the provided context. 
    If you know the answer but it is not based in the provided context, don't provide 
    the answer, just state the answer is not in the context provided.
  delete_file_button_enabled: true
  delete_all_files_button_enabled: true

llm:
  mode: ollama
  # Should be matching the selected model
  max_new_tokens: 2048
  context_window: 8192
  tokenizer: mistralai/Mistral-7B-Instruct-v0.2

embedding:
  # Should be matching the value above in most cases
  mode: local
  ingest_mode: simple

vectorstore:
  database: qdrant

qdrant:
  path: local_data/private_gpt/qdrant

pgvector:
  host: localhost
  port: 5432
  database: postgres
  user: postgres
  password: postgres
  embed_dim: 1024 # 384 is for BAAI/bge-small-en-v1.5 1024 for BAAI/bge-m3
  schema_name: private_gpt
  table_name: embeddings

local:
  prompt_style: "mistral"
  # llm_hf_repo_id: TheBloke/Mistral-7B-Instruct-v0.2-GGUF
  # llm_hf_model_file: mistral-7b-instruct-v0.2.Q4_K_M.gguf
  llm_hf_repo_id: lmstudio-ai/gemma-2b-it-GGUF
  llm_hf_model_file: gemma-2b-it-q4_k_m.gguf
  embedding_hf_model_name: BAAI/bge-m3
  # embedding_hf_model_name: BAAI/bge-small-en-v1.5

sagemaker:
  llm_endpoint_name: huggingface-pytorch-tgi-inference-2023-09-25-19-53-32-140
  embedding_endpoint_name: huggingface-pytorch-inference-2023-11-03-07-41-36-479

openai:
  api_key: ${OPENAI_API_KEY:}
  model: gpt-3.5-turbo

ollama:
  model: gemma:2b

Vivek-C-Shah commented 4 months ago

setting pgpt profiles to ollama giving errors

(venv) PS Path\to\project> PGPT_PROFILES=ollama poetry run python -m private_gpt
PGPT_PROFILES=ollama : The term 'PGPT_PROFILES=ollama' is not recognized as the name of a cmdlet, function, script 
file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is
correct and try again.
At line:1 char:1
+ PGPT_PROFILES=ollama poetry run python -m private_gpt
+ ~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (PGPT_PROFILES=ollama:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

(venv) PS Path\to\project> set PGPT_PROFILES=ollama poetry run python -m private_gpt
Set-Variable : A positional parameter cannot be found that accepts argument 'run'.
At line:1 char:1
+ set PGPT_PROFILES=ollama poetry run python -m private_gpt
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Set-Variable], ParameterBindingException
    + FullyQualifiedErrorId : PositionalParameterNotFound,Microsoft.PowerShell.Commands.SetVariableCommand

ingridstevens commented 4 months ago

@Vivek-C-Shah For using Ollama, I never used or modified the default settings.yaml file. Instead I created a file for ollama. This is what worked for me on my Mac, I don't know if this will work for you on Windows

Ollama Settings YAML

You need to create a settings-ollama.yaml file with the following:

llm:
  mode: ollama

ollama:
  model: gemma:2b-instruct # Required Model to use.
                               # Note: Ollama Models are listed here: https://ollama.ai/library
                               #       Be sure to pull the model to your Ollama server
  api_base: http://localhost:11434 # Ollama defaults to http://localhost:11434

Run Ollama with the Exact Same Model as in the YAML

Then make sure ollama is running with: ollama run gemma:2b-instruct

Make sure you've installed the local dependencies:

poetry install --with local

Set up PGPT profile & Test

Also - try setting the PGPT profiles in it's own line: export PGPT_PROFILES=ollama

and then check that it's set with: echo $PGPT_PROFILES

Run PrivateGPT

and then run: make run

Vivek-C-Shah commented 4 months ago

Okay thanks dude for this information! 🫂

zylon-ai / private-gpt