Groq integration - Githubissues

Hi all,
I have been trying to get h2ogpt to work in remote inference server mode, but so far haven't been successful. This is how I invoke the app, as mentioned here: python generate.py --inference_server="vllm:https://api.groq.com/openai:None:/v1:$GROQ_API_KEY" --base_model='mixtral-8x7b-32768' --max_seq_len=31744 --prompt_type='plain'
And this is the output:
No GPUs detected
Using Model mixtral-8x7b-32768
Starting get_model: mixtral-8x7b-32768 vllm:https://api.groq.com/openai:None:/v1:<groq_api_key>
Not using tokenizer from HuggingFace:

Traceback (most recent call last):
  File "/home/premchan/.local/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 270, in hf_raise_for_status
    response.raise_for_status()
  File "/home/premchan/.local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/mixtral-8x7b-32768/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/premchan/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/home/premchan/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/premchan/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1374, in hf_hub_download
    raise head_call_error
  File "/home/premchan/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1247, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/home/premchan/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/premchan/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1624, in get_hf_file_metadata
    r = _request_wrapper(
  File "/home/premchan/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 402, in _request_wrapper
    response = _request_wrapper(
  File "/home/premchan/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 426, in _request_wrapper
    hf_raise_for_status(response)
  File "/home/premchan/.local/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 320, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-65f2dc5e-16ee07f97270004e1946decb;f647e478-1e21-4291-a5d0-30cfb7cc030f)

Repository Not Found for url: https://huggingface.co/mixtral-8x7b-32768/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/premchan/h2ogpt/src/gen.py", line 2241, in get_config
    config = AutoConfig.from_pretrained(base_model, token=use_auth_token,
  File "/home/premchan/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1111, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/premchan/.local/lib/python3.10/site-packages/transformers/configuration_utils.py", line 633, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/premchan/.local/lib/python3.10/site-packages/transformers/configuration_utils.py", line 688, in _get_config_dict
    resolved_config_file = cached_file(
  File "/home/premchan/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 421, in cached_file
    raise EnvironmentError(
OSError: mixtral-8x7b-32768 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
Model {'base_model': 'mixtral-8x7b-32768', 'base_model0': 'mixtral-8x7b-32768', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': 'vllm:https://api.groq.com/openai:None:/v1:<groq_api_key>', 'prompt_type': 'plain', 'prompt_dict': {'promptA': None, 'promptB': None, 'PreInstruct': None, 'PreInput': None, 'PreResponse': None, 'terminate_response': [], 'chat_sep': '\n', 'chat_turn_sep': '\n', 'humanstr': None, 'botstr': None, 'generates_leading_space': False, 'system_prompt': '', 'can_handle_system_prompt': False}, 'visible_models': None, 'h2ogpt_key': None, 'load_8bit': False, 'load_4bit': False, 'low_bit_mode': 1, 'load_half': False, 'use_flash_attention_2': False, 'load_gptq': '', 'load_awq': '', 'load_exllama': False, 'use_safetensors': False, 'revision': None, 'use_gpu_id': False, 'gpu_id': None, 'compile_model': None, 'use_cache': None, 'llamacpp_dict': {'n_gpu_layers': 100, 'use_mlock': True, 'n_batch': 1024, 'n_gqa': 0, 'model_path_llama': '', 'model_name_gptj': '', 'model_name_gpt4all_llama': '', 'model_name_exllama_if_no_config': ''}, 'rope_scaling': {}, 'max_seq_len': 31694, 'max_output_seq_len': None, 'exllama_dict': {}, 'gptq_dict': {}, 'attention_sinks': False, 'sink_dict': {}, 'truncation_generation': False, 'hf_model_dict': {}}
Begin auto-detect HF cache text generation models
No loading model microsoft/speecht5_hifigan because The checkpoint you are trying to load has model type `hifigan` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
No loading model microsoft/speecht5_tts because is_encoder_decoder=True
No loading model openai/whisper-base.en because is_encoder_decoder=True
End auto-detect HF cache text generation models
Begin auto-detect llama.cpp models
End auto-detect llama.cpp models
/home/premchan/.local/lib/python3.10/site-packages/gradio/components/dropdown.py:173: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: None or set allow_custom_value=True.
  warnings.warn(
Running on local URL:  http://0.0.0.0:7860

Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB

To create a public link, set `share=True` in `launch()`.
Started Gradio Server and/or GUI: server_name: localhost port: None
Use local URL: http://localhost:7860/
/home/premchan/.local/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/home/premchan/.local/lib/python3.10/site-packages/pydantic/_internal/_fields.py:151: UserWarning: Field "model_names" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
OpenAI API URL: http://0.0.0.0:5000
INFO:__name__:OpenAI API URL: http://0.0.0.0:5000
OpenAI API key: EMPTY
INFO:__name__:OpenAI API key: EMPTY
INFO:     10.198.5.232:59222 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     10.198.5.232:59222 - "GET /queue/data?session_hash=xa743me4ifi HTTP/1.1" 200 OK
INFO:     10.198.5.232:59222 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     10.198.5.232:59222 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     10.198.5.232:59222 - "GET /queue/data?session_hash=xa743me4ifi HTTP/1.1" 200 OK
INFO:     10.198.5.232:59222 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "GET /queue/data?session_hash=xa743me4ifi HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "GET /queue/data?session_hash=xa743me4ifi HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "GET /queue/data?session_hash=xa743me4ifi HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "GET /queue/data?session_hash=xa743me4ifi HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "POST /queue/join HTTP/1.1" 200 OK
INFO:     10.198.5.232:59223 - "GET /queue/data?session_hash=xa743me4ifi HTTP/1.1" 200 OK
evaluate_nochat exception: Error code: 404 - {'error': {'message': 'Unknown request URL: POST /openai/v1/completions. Please check the URL for typos, or see the docs at https://console.groq.com/docs/', 'type': 'invalid_request_error', 'code': 'unknown_url'}}: ('', '', '', True, 'plain', "{   'PreInput': None,\n    'PreInstruct': None,\n    'PreResponse': None,\n    'botstr': None,\n    'can_handle_system_prompt': False,\n    'chat_sep': '\\n',\n    'chat_turn_sep': '\\n',\n    'generates_leading_space': False,\n    'humanstr': None,\n    'promptA': None,\n    'promptB': None,\n    'system_prompt': '',\n    'terminate_response': []}", 0.1, 0.75, 40, 0, 1, 1024, 0, False, 600, 1.07, 1, False, True, '', '', 'LLM', True, 'Query', [], 10, True, 512, 'Relevant', ['All'], [], [], [], [], 'Pay attention and remember the information below, which will help to answer the question or imperative after the context ends.', 'According to only the information in the document sources provided within the context above, write an insightful and well-structured response to: ', 'In order to write a concise single-paragraph or bulleted list summary, pay attention to the following text.', 'Using only the information in the document sources above, write a condensed and concise summary of key results (preferably as bullet points).', 'Answer this question with vibrant details in order for some NLP embedding model to use that answer as better query than original question: ', 'auto', ['DocTR', 'ASR'], ['PyMuPDF'], ['Unstructured'], '.[]', 10, 'auto', [], '', False, '[]', '[]', 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, 'auto', False, False, '[]', 'None', None, [], 1, None, None, {'model': 'model', 'tokenizer': 'tokenizer', 'device': 'vllm:https://api.groq.com/openai:None:/v1:<groq_api_key>', 'base_model': 'mixtral-8x7b-32768', 'tokenizer_base_model': '', 'lora_weights': '[]', 'inference_server': 'vllm:https://api.groq.com/openai:None:/v1:<groq_api_key>', 'prompt_type': 'plain', 'prompt_dict': {'promptA': None, 'promptB': None, 'PreInstruct': None, 'PreInput': None, 'PreResponse': None, 'terminate_response': [], 'chat_sep': '\n', 'chat_turn_sep': '\n', 'humanstr': None, 'botstr': None, 'generates_leading_space': False, 'system_prompt': '', 'can_handle_system_prompt': False}, 'visible_models': 0, 'h2ogpt_key': None}, {'MyData': [None, 'c01082d7-aff5-4066-8288-f2f4350d4a5e', 'c01082d7-aff5-4066-8288-f2f4350d4a5e']}, {'langchain_modes': ['UserData', 'MyData', 'LLM', 'Disabled'], 'langchain_mode_paths': {'UserData': None}, 'langchain_mode_types': {'UserData': 'shared', 'github h2oGPT': 'shared', 'DriverlessAI docs': 'shared', 'wiki': 'shared', 'wiki_full': '', 'MyData': 'personal', 'LLM': 'personal', 'Disabled': 'personal'}}, {'headers': '', 'host': '<server_ip>:7860', 'username': 'c01082d7-aff5-4066-8288-f2f4350d4a5e', 'connection': 'keep-alive', 'content-length': '120', 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36', 'dnt': '1', 'content-type': 'application/json', 'accept': '*/*', 'origin': 'http://<server_ip>:7860', 'referer': 'http://<server_ip>:7860/', 'accept-encoding': 'gzip, deflate', 'accept-language': 'en-IN,en-GB;q=0.9,en-US;q=0.8,en;q=0.7,nl;q=0.6,fr;q=0.5', 'host2': '10.198.5.232'}, {}, [['groq', '']])
Traceback (most recent call last):
  File "/home/premchan/.local/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/premchan/.local/lib/python3.10/site-packages/gradio/route_utils.py", line 235, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/premchan/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1627, in process_api
    result = await self.call_function(
  File "/home/premchan/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1185, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/home/premchan/.local/lib/python3.10/site-packages/gradio/utils.py", line 514, in async_iteration
    return await iterator.__anext__()
  File "/home/premchan/.local/lib/python3.10/site-packages/gradio/utils.py", line 507, in __anext__
    return await anyio.to_thread.run_sync(
  File "/home/premchan/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/premchan/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/home/premchan/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/premchan/.local/lib/python3.10/site-packages/gradio/utils.py", line 490, in run_sync_iterator_async
    return next(iterator)
  File "/home/premchan/.local/lib/python3.10/site-packages/gradio/utils.py", line 673, in gen_wrapper
    response = next(iterator)
  File "/home/premchan/h2ogpt/src/gradio_runner.py", line 4573, in bot
    for res in get_response(fun1, history, chatbot_role1, speaker1, tts_language1, roles_state1,
  File "/home/premchan/h2ogpt/src/gradio_runner.py", line 4468, in get_response
    for output_fun in fun1():
  File "/home/premchan/h2ogpt/src/gen.py", line 4275, in evaluate
    responses = openai_client.completions.create(
  File "/home/premchan/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_utils/_utils.py", line 275, in wrapper
    return func(*args, **kwargs)
  File "/home/premchan/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/resources/completions.py", line 516, in create
    return self._post(
  File "/home/premchan/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 1208, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/home/premchan/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 897, in request
    return self._request(
  File "/home/premchan/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/openai/_base_client.py", line 988, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'error': {'message': 'Unknown request URL: POST /openai/v1/completions. Please check the URL for typos, or see the docs at https://console.groq.com/docs/', 'type': 'invalid_request_error', 'code': 'unknown_url'}}
My system has openai 1.12.0 python package installed. Has anyone had any success doing this?
h2oai / h2ogpt

Groq integration #1476