h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/
http://h2o.ai
Apache License 2.0
10.94k stars 1.2k forks source link

Running H2ogpt with Ollama inference Server #1670

Closed rohitnanda1443 closed 5 days ago

rohitnanda1443 commented 3 weeks ago

Hi All,

I am trying to run an inference server on Ollama using the below script:

ollama run mistral:v0.3

Then running h2o-gpt using the below script:

python generate.py --guest_name='' --base_model=mistralai/Mistral-7B-Instruct-v0.3 --max_seq_len=8094 --enable_tts=False --enable_stt=False --enable_transcriptions=False --use_gpu_id=False --inference_server=vllm_chat:http://localhost:11434/v1/ --prompt_type=openai_chat &

Issue I am facing is that model not found in the H2o UI. What is the correct model name for the Ollama mistral-0.3 model to be passed in the CLI for H20?

pseudotensor commented 3 weeks ago

Did you follow this?

https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#running-ollama-vs-h2ogpt-as-inference-server

pseudotensor commented 5 days ago

As in the instructions, the "base_model" has to be the same name as ollama was set to.

i.e.

python generate.py --guest_name='' --base_model=mistral:v0.3 --max_seq_len=8094 --enable_tts=False --enable_stt=False --enable_transcriptions=False --use_gpu_id=False --inference_server=vllm_chat:http://localhost:11434/v1/ --prompt_type=openai_chat

and ignore errors about not finding the tokenizer etc.

For more accurate tokenization specify the tokenizer and hf token (because mistralai is gated on HF):

python generate.py --guest_name='' --base_model=mistral:v0.3 --tokenizer_base_model=mistralai/Mistral-7B-Instruct-v0.3 --max_seq_len=8094 --enable_tts=False --enable_stt=False --enable_transcriptions=False --use_gpu_id=False --inference_server=vllm_chat:http://localhost:11434/v1/ --prompt_type=openai_chat --use_auth_token=<token>

image