Running H2ogpt with Ollama inference Server

rohitnanda1443 commented 3 weeks ago

Hi All,

I am trying to run an inference server on Ollama using the below script:

ollama run mistral:v0.3

Then running h2o-gpt using the below script:

python generate.py --guest_name='' --base_model=mistralai/Mistral-7B-Instruct-v0.3 --max_seq_len=8094 --enable_tts=False --enable_stt=False --enable_transcriptions=False --use_gpu_id=False --inference_server=vllm_chat:http://localhost:11434/v1/ --prompt_type=openai_chat &

Issue I am facing is that model not found in the H2o UI. What is the correct model name for the Ollama mistral-0.3 model to be passed in the CLI for H20?

pseudotensor commented 3 weeks ago

Did you follow this?

https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#running-ollama-vs-h2ogpt-as-inference-server

pseudotensor commented 5 days ago

As in the instructions, the "base_model" has to be the same name as ollama was set to.

i.e.

python generate.py --guest_name='' --base_model=mistral:v0.3 --max_seq_len=8094 --enable_tts=False --enable_stt=False --enable_transcriptions=False --use_gpu_id=False --inference_server=vllm_chat:http://localhost:11434/v1/ --prompt_type=openai_chat

and ignore errors about not finding the tokenizer etc.

For more accurate tokenization specify the tokenizer and hf token (because mistralai is gated on HF):

python generate.py --guest_name='' --base_model=mistral:v0.3 --tokenizer_base_model=mistralai/Mistral-7B-Instruct-v0.3 --max_seq_len=8094 --enable_tts=False --enable_stt=False --enable_transcriptions=False --use_gpu_id=False --inference_server=vllm_chat:http://localhost:11434/v1/ --prompt_type=openai_chat --use_auth_token=<token>

h2oai / h2ogpt

Running H2ogpt with Ollama inference Server #1670