Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/
I wanted to use my working local llama.cpp server as the inference server.
I looked here and I put --inference_server="http://localhost:8080/v1" but it doesn't work.
HF Client Begin: http://localhost:8080/v1 gpt-3.5-turbo
Starting get_model: gpt-3.5-turbo http://localhost:8080/v1
GR Client Begin: http://localhost:8080/v1 gpt-3.5-turbo
Loaded as API: http://localhost:8080/v1/ ✔
GR Client Failed http://localhost:8080/v1 gpt-3.5-turbo: Could not fetch config for http://localhost:8080/v1/
HF Client Begin: http://localhost:8080/v1 gpt-3.5-turbo
Starting get_model: gpt-3.5-turbo http://localhost:8080/v1
GR Client Begin: http://localhost:8080/v1 gpt-3.5-turbo
Loaded as API: http://localhost:8080/v1/ ✔
GR Client Failed http://localhost:8080/v1 gpt-3.5-turbo: Could not fetch config for http://localhost:8080/v1/
HF Client Begin: http://localhost:8080/v1 gpt-3.5-turbo
Starting get_model: gpt-3.5-turbo http://localhost:8080/v1
GR Client Begin: http://localhost:8080/v1 gpt-3.5-turbo
Loaded as API: http://localhost:8080/v1/ ✔
GR Client Failed http://localhost:8080/v1 gpt-3.5-turbo: Could not fetch config for http://localhost:8080/v1/
HF Client Begin: http://localhost:8080/v1 gpt-3.5-turbo
Traceback (most recent call last):
File "/home/fae/h2ogpt/generate.py", line 20, in <module>
entrypoint_main()
File "/home/fae/h2ogpt/generate.py", line 16, in entrypoint_main
H2O_Fire(main)
File "/home/fae/h2ogpt/src/utils.py", line 73, in H2O_Fire
fire.Fire(component=component, command=args)
File "/home/fae/h2ogpt/venv/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/fae/h2ogpt/venv/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/fae/h2ogpt/venv/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/fae/h2ogpt/src/gen.py", line 2347, in main
model0, tokenizer0, device = get_model_retry(reward_type=False,
File "/home/fae/h2ogpt/src/gen.py", line 2718, in get_model_retry
model1, tokenizer1, device1 = get_model(**kwargs)
File "/home/fae/h2ogpt/src/gen.py", line 3021, in get_model
inference_server, gr_client, hf_client = get_client_from_inference_server(inference_server,
File "/home/fae/h2ogpt/src/gen.py", line 2699, in get_client_from_inference_server
res = hf_client.generate('What?', max_new_tokens=1)
File "/home/fae/h2ogpt/venv/lib/python3.10/site-packages/text_generation/client.py", line 284, in generate
raise parse_error(resp.status_code, payload)
text_generation.errors.NotFoundError: {'code': 404, 'message': 'File Not Found', 'type': 'not_found_error'}
llama.cpp only has completion and embedding routes, don't know if that's the problem.
I wanted to use my working local llama.cpp server as the inference server. I looked here and I put
--inference_server="http://localhost:8080/v1"
but it doesn't work.llama.cpp only has completion and embedding routes, don't know if that's the problem.