sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
5.95k stars 489 forks source link

[Bug] liuhaotian/llava-v1.6-mistral-7b doesn't load #128

Closed fozziethebeat closed 3 months ago

fozziethebeat commented 9 months ago

When trying to load the Mistral variant of LLaVa 1.6, I get an expected error:

python3 -m sglang.launch_server --model-path liuhaotian/llava-v1.6-mistral-7b --chat-template vicuna_v1.1 --port 30000
ValueError: The checkpoint you are trying to load has model type `llava_mistral` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date

Transformers doesn't treat the LLaVa variants any differently, they all use the same config. I think this could be easily fixed by adding a mapping from llava_mistral to the LlavaConfig in the config mapping.

comaniac commented 9 months ago

I believe this is an issue in transformers:

from transformers import AutoConfig
AutoConfig.from_pretrained("liuhaotian/llava-v1.6-mistral-7b")
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1.63k/1.63k [00:00<00:00, 14.6MB/s]
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1117, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 813, in __getitem__
    raise KeyError(key)
KeyError: 'llava_mistral'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1119, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `llava_mistral` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

transformers has to add this key to their AutoConfig. Usually the model author would file a PR to it. Of course SGLang also need to support architecture name LlavaMistralForCausalLM. It currently only supports LlavaLlamaForCausalLM.

fozziethebeat commented 9 months ago

Yes, that's a better description of the underlying cause. Is it possible for SGLang to add a mapping for the AutoConfig? I would assume that's not too hard as a patch until transformers gets to it (I doubt they actually would since they seem to treat BakLLaVa as a "llava" config).

fozziethebeat commented 9 months ago

Note: I'm pretty sure this model would also hit the same problem I describe in #127 after solving these issues.

comaniac commented 9 months ago

This way "kind" of work:

>>> from transformers.models.llava.configuration_llava import LlavaConfig
>>> LlavaConfig.from_pretrained("liuhaotian/llava-v1.6-mistral-7b")
You are using a model of type llava_mistral to instantiate a model of type llava. This is not supported for all configurations of models and can yield errors.
LlavaConfig {
  "_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2",
  "architectures": [
    "LlavaMistralForCausalLM"
  ],
...

You can hack hf_transformers_utils.py in SGLang to make it work, but I'm not sure if the later loaded model is functional correct. Still, I don't think this problem should be fixed in SGLang.

Another way is:

  1. Download the model to local disk: git clone https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b
  2. Modify llava-v1.6-mistral-7b/config.json by changing its architecture to LlavaLlamaForCausalLM.
  3. Load the modified checkpoint with --model-path ./llava-v1.6-mistral-7b

I would recommend this way as Mistral has exactly the same architecture as Llama2 so the checkpoints are sharable.

clam004 commented 9 months ago

Thanks @comaniac this gets me past the you are trying to load has model typellava_mistral` but the next blocker isllava-v1.6-mistral-7b does not appear to have a file named preprocessor_config.json`

dillonalaird commented 9 months ago

I've created a repo for llava-v1.6-34B that works https://huggingface.co/dillonlaird/hf-llava-v1.6-34b looking at the other llava-hf models you basically need to add the preprocessor_config.json, which should be the same, and then add the added_tokens.json file

aliencaocao commented 9 months ago

doesnt work for llava 7b mistral still. Model can load with the hack to config file but it output gibberish, with most common token 'argued'

aliencaocao commented 9 months ago

just got it working, see https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b/discussions/2/files

fozziethebeat commented 9 months ago

that is also a great upstream way to fix this!

clam004 commented 9 months ago

I've created a repo for llava-v1.6-34B that works https://huggingface.co/dillonlaird/hf-llava-v1.6-34b looking at the other llava-hf models you basically need to add the preprocessor_config.json, which should be the same, and then add the added_tokens.json file

Thanks @dillonalaird , when trying to run your version of llava-1.6 using

python3 -m sglang.launch_server \
--model-path dillonlaird/hf-llava-v1.6-34b \
--tokenizer-path dillonlaird/hf-llava-v1.6-34b \
--host "0.0.0.0" \
--tp 2 \
--port 2222 \
--model-mode flashinfer

I ran into error RuntimeError: BatchPrefillWithPagedKVCache failed to dispatch with dtype Half

How are you running this for it to work for you? thanks!

fozziethebeat commented 8 months ago

I went and patched the 1.6 Mistral model to work more smoothly with SGLang: SurfaceData/llava-v1.6-mistral-7b-sglang. I also tweaked the vicuna version as well SurfaceData/llava-v1.6-vicuna-7b-sglang

aliencaocao commented 8 months ago

I've created a repo for llava-v1.6-34B that works https://huggingface.co/dillonlaird/hf-llava-v1.6-34b looking at the other llava-hf models you basically need to add the preprocessor_config.json, which should be the same, and then add the added_tokens.json file

Thanks @dillonalaird , when trying to run your version of llava-1.6 using

python3 -m sglang.launch_server \
--model-path dillonlaird/hf-llava-v1.6-34b \
--tokenizer-path dillonlaird/hf-llava-v1.6-34b \
--host "0.0.0.0" \
--tp 2 \
--port 2222 \
--model-mode flashinfer

I ran into error RuntimeError: BatchPrefillWithPagedKVCache failed to dispatch with dtype Half

How are you running this for it to work for you? thanks!

Does it error if you dont use flashinfer? The error message sounds like flashinfer cannot cast your model to fp16 possibly due to outdated hardware or cuda driver

Gutianpei commented 8 months ago

I have the same error with flashinfer too, llava1.5 is fine with flashinfer. Sounds like there is a bug specific with llava1.6 model loading with flashinfer.

github-actions[bot] commented 3 months ago

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

pooya-mohammadi commented 2 months ago

I have the same issue. Is there any updates?