Closed fozziethebeat closed 3 months ago
I believe this is an issue in transformers:
from transformers import AutoConfig
AutoConfig.from_pretrained("liuhaotian/llava-v1.6-mistral-7b")
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1.63k/1.63k [00:00<00:00, 14.6MB/s]
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1117, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 813, in __getitem__
raise KeyError(key)
KeyError: 'llava_mistral'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1119, in from_pretrained
raise ValueError(
ValueError: The checkpoint you are trying to load has model type `llava_mistral` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
transformers has to add this key to their AutoConfig. Usually the model author would file a PR to it. Of course SGLang also need to support architecture name LlavaMistralForCausalLM
. It currently only supports LlavaLlamaForCausalLM
.
Yes, that's a better description of the underlying cause. Is it possible for SGLang to add a mapping for the AutoConfig? I would assume that's not too hard as a patch until transformers gets to it (I doubt they actually would since they seem to treat BakLLaVa as a "llava" config).
Note: I'm pretty sure this model would also hit the same problem I describe in #127 after solving these issues.
This way "kind" of work:
>>> from transformers.models.llava.configuration_llava import LlavaConfig
>>> LlavaConfig.from_pretrained("liuhaotian/llava-v1.6-mistral-7b")
You are using a model of type llava_mistral to instantiate a model of type llava. This is not supported for all configurations of models and can yield errors.
LlavaConfig {
"_name_or_path": "mistralai/Mistral-7B-Instruct-v0.2",
"architectures": [
"LlavaMistralForCausalLM"
],
...
You can hack hf_transformers_utils.py
in SGLang to make it work, but I'm not sure if the later loaded model is functional correct. Still, I don't think this problem should be fixed in SGLang.
Another way is:
git clone https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b
llava-v1.6-mistral-7b/config.json
by changing its architecture to LlavaLlamaForCausalLM
.--model-path ./llava-v1.6-mistral-7b
I would recommend this way as Mistral has exactly the same architecture as Llama2 so the checkpoints are sharable.
Thanks @comaniac this gets me past the you are trying to load has model type
llava_mistral` but the next blocker is
llava-v1.6-mistral-7b does not appear to have a file named preprocessor_config.json`
I've created a repo for llava-v1.6-34B that works https://huggingface.co/dillonlaird/hf-llava-v1.6-34b looking at the other llava-hf models you basically need to add the preprocessor_config.json
, which should be the same, and then add the added_tokens.json
file
doesnt work for llava 7b mistral still. Model can load with the hack to config file but it output gibberish, with most common token 'argued'
just got it working, see https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b/discussions/2/files
that is also a great upstream way to fix this!
I've created a repo for llava-v1.6-34B that works https://huggingface.co/dillonlaird/hf-llava-v1.6-34b looking at the other llava-hf models you basically need to add the
preprocessor_config.json
, which should be the same, and then add theadded_tokens.json
file
Thanks @dillonalaird , when trying to run your version of llava-1.6 using
python3 -m sglang.launch_server \
--model-path dillonlaird/hf-llava-v1.6-34b \
--tokenizer-path dillonlaird/hf-llava-v1.6-34b \
--host "0.0.0.0" \
--tp 2 \
--port 2222 \
--model-mode flashinfer
I ran into error RuntimeError: BatchPrefillWithPagedKVCache failed to dispatch with dtype Half
How are you running this for it to work for you? thanks!
I went and patched the 1.6 Mistral model to work more smoothly with SGLang: SurfaceData/llava-v1.6-mistral-7b-sglang. I also tweaked the vicuna version as well SurfaceData/llava-v1.6-vicuna-7b-sglang
I've created a repo for llava-v1.6-34B that works https://huggingface.co/dillonlaird/hf-llava-v1.6-34b looking at the other llava-hf models you basically need to add the
preprocessor_config.json
, which should be the same, and then add theadded_tokens.json
fileThanks @dillonalaird , when trying to run your version of llava-1.6 using
python3 -m sglang.launch_server \ --model-path dillonlaird/hf-llava-v1.6-34b \ --tokenizer-path dillonlaird/hf-llava-v1.6-34b \ --host "0.0.0.0" \ --tp 2 \ --port 2222 \ --model-mode flashinfer
I ran into error
RuntimeError: BatchPrefillWithPagedKVCache failed to dispatch with dtype Half
How are you running this for it to work for you? thanks!
Does it error if you dont use flashinfer? The error message sounds like flashinfer cannot cast your model to fp16 possibly due to outdated hardware or cuda driver
I have the same error with flashinfer too, llava1.5 is fine with flashinfer. Sounds like there is a bug specific with llava1.6 model loading with flashinfer.
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.
I have the same issue. Is there any updates?
When trying to load the Mistral variant of LLaVa 1.6, I get an expected error:
Transformers doesn't treat the LLaVa variants any differently, they all use the same config. I think this could be easily fixed by adding a mapping from
llava_mistral
to theLlavaConfig
in the config mapping.