Give a more friendly error if you try to use a gguf model with vllm

russellb commented 1 week ago

I had an existing config and switched my serve backend setting to vllm. This was using a quantized version of Mixtral in gguf format.

The error I got didn't make it obvious at all that a gguf wasn't supported,.

Traceback (most recent call last):
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/transformers/configuration_utils.py", line 722, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/transformers/configuration_utils.py", line 825, in _dict_from_json_file
    text = reader.read()
           ^^^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 8: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 196, in <module>
    engine = AsyncLLMEngine.from_engine_args(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 371, in from_engine_args
    engine_config = engine_args.create_engine_config()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/vllm/engine/arg_utils.py", line 630, in create_engine_config
    model_config = ModelConfig(
                   ^^^^^^^^^^^^
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/vllm/config.py", line 137, in __init__
    self.hf_config = get_config(self.model, trust_remote_code, revision,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/vllm/transformers_utils/config.py", line 33, in get_config
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 965, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/venv/lib64/python3.11/site-packages/transformers/configuration_utils.py", line 726, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at 'models/mixtral-8x7b-v0.1.Q4_K_M.gguf' is not a valid JSON file.

nathan-weinberg commented 1 week ago

leseb commented 1 week ago

This error happens inside the subprocess and sdtout is never treated. This is a bug in the validation anyways. Will fix that.

leseb commented 1 week ago

Just need a few PRs to merge before I can do this one. Like https://github.com/instructlab/instructlab/pull/1576 and https://github.com/instructlab/instructlab/pull/1561

leseb commented 6 days ago

https://github.com/instructlab/instructlab/pull/1576 will fix this.

instructlab / instructlab

Give a more friendly error if you try to use a gguf model with vllm #1571