huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.98k stars 27k forks source link

[BUG] Latest version cannot load Qwen2-VL model config correctly. #33401

Closed fyabc closed 1 month ago

fyabc commented 2 months ago

System Info

Who can help?

@amyeroberts @qubvel

Information

Tasks

Reproduction

  1. Download the config.json from Qwen2-VL-7B-Instruct HF main repo to /tmp/Qwen2-VL-7B-Instruct/config.json.
    • The downloaded config file content should be:
{
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "vision_start_token_id": 151652,
  "vision_end_token_id": 151653,
  "vision_token_id": 151654,
  "image_token_id": 151655,
  "video_token_id": 151656,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vision_config": {
    "depth": 32,
    "embed_dim": 1280,
    "mlp_ratio": 4,
    "num_heads": 16,
    "in_chans": 3,
    "hidden_size": 3584,
    "patch_size": 14,
    "spatial_merge_size": 2,
    "spatial_patch_size": 14,
    "temporal_patch_size": 2
  },
  "rope_scaling": {
    "type": "mrope",
    "mrope_section": [
      16,
      24,
      24
    ]
  },
  "vocab_size": 152064
}
  1. Install the latest transformers version via pip install git+https://github.com/huggingface/transformers@main
  2. Run the following script:
from transformers import AutoConfig
config = AutoConfig.from_pretrained('/tmp/Qwen2-VL-7B-Instruct/')
print(config)
  1. The result is:
Unrecognized keys in `rope_scaling` for 'rope_type'='default': {'mrope_section'}

Qwen2VLConfig {
  "_name_or_path": "/tmp/Qwen2-VL-7B-Instruct/",
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "image_token_id": 151655,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": {
    "mrope_section": [
      16,
      24,
      24
    ],
    "rope_type": "default",
    "type": "default"
  },
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0.dev0",
  "use_cache": true,
  "use_sliding_window": false,
  "video_token_id": 151656,
  "vision_config": {
    "in_chans": 3,
    "model_type": "qwen2_vl",
    "spatial_patch_size": 14
  },
  "vision_end_token_id": 151653,
  "vision_start_token_id": 151652,
  "vision_token_id": 151654,
  "vocab_size": 152064
}

It prints a warning message, and the output rope_scaling.type and rope_scaling.rope_type are set to default, but mrope is expected.

Expected behavior

This bug seems to be introduced in a recent version of transformers. When I switch to a old version by git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830, the output is correct:

Qwen2VLConfig {
  "_name_or_path": "/tmp/Qwen2-VL-7B-Instruct/",
  "architectures": [
    "Qwen2VLForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "image_token_id": 151655,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2_vl",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_scaling": {
    "mrope_section": [
      16,
      24,
      24
    ],
    "type": "mrope"
  },
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.45.0.dev0",
  "use_cache": true,
  "use_sliding_window": false,
  "video_token_id": 151656,
  "vision_config": {
    "in_chans": 3,
    "model_type": "qwen2_vl",
    "spatial_patch_size": 14
  },
  "vision_end_token_id": 151653,
  "vision_start_token_id": 151652,
  "vision_token_id": 151654,
  "vocab_size": 152064
}
wangaocheng commented 2 months ago

yes the same error

LysandreJik commented 2 months ago

cc @zucchini-nlp as well I believe

zucchini-nlp commented 2 months ago

Hey! Yes, the warning is currently misleading as the RoPE implementation was recently standardized and Qwen2-VL has a quite different rope-scaling dict compared to other models. Yet, the generation quality shouldn't be affected by that, as per my last interaction with the model everything was same as before standardization

cc @gante as well, as you're working on uniform-RoPE, this might be something we want to fix

gante commented 2 months ago

@zucchini-nlp if it is an expected argument, then we shouldn't throw a warning.

Perhaps we could add a extra_ignore_key argument to rope_config_validation, to define additional keys to ignore? I'm expecting this pattern (updating keys but wanting to keep the original in the config instance for BC) to happen again in the future

zucchini-nlp commented 2 months ago

@gante yes, that sounds good. I believe this will be part of your RoPE standardization PR, since it's not very urgent and generation is not broken

monkeywl2020 commented 2 months ago

In the initialization function of class Qwen2VLConfig in src/transformers/models/qwen2_vl/configuration_qwen2_vl.py, I found this code。

if self.rope_scaling is not None and "type" in self.rope_scaling: 
       if self.rope_scaling["type"] == "mrope": 
               self.rope_scaling["type"] = "default" 
       self.rope_scaling["rope_type"] = self.rope_scaling["type"] 

This place has modified the configuration。 rope_scaling["type"] and rope_scaling["rope_type"] Changed to default

zucchini-nlp commented 2 months ago

@monkeywl2020 yes, that was a hack to enable uniform RoPE which currently doesn't accept mrope-dtype and since mrope is same as the default rope, with the only difference that the position ids have an extra dimension for height/width/temporal dim

We'll handle this in a better way, to accept non-standard rope kwargs soon

monkeywl2020 commented 2 months ago

@monkeywl2020 yes, that was a hack to enable uniform RoPE which currently doesn't accept mrope-dtype and since mrope is same as the default rope, with the only difference that the position ids have an extra dimension for height/width/temporal dim

We'll handle this in a better way, to accept non-standard rope kwargs soon

OK

fyabc commented 2 months ago

@zucchini-nlp Hi, can you provide an approximate time for this bug to be fixed?

zucchini-nlp commented 2 months ago

@gante will you add this to your general RoPE PR or we can fix it separately?

exceedzhang commented 1 month ago
image

the same error!

RANYABING commented 1 month ago

same error!

Unrecognized keys in rope_scaling for 'rope_type'='default': {'mrope_section'} Traceback (most recent call last): ......

IvanZidov commented 1 month ago

Same here!

niaoyu commented 1 month ago

Just pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830

is OK.

The pr: https://github.com/huggingface/transformers/pull/32617 seems break the logic about the qwen rope parameter

xuyue1112 commented 1 month ago

same problem. If I have already trained with the latest version of master, do I need to retrain with 21fac7abba2a37fae86106f87fcf9974fd1e3830, or do I only need to use this version for inference?

Andcircle commented 1 month ago

Just pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830

is OK.

The pr: #32617 seems break the logic about the qwen rope parameter

Thanks for help, after I installed this specific version then facing a different error: No module named 'transformers.models.mllama'

Any hints?

zucchini-nlp commented 1 month ago

Just a heads up, a fix PR is already on its way. For anyone who faces the same problem, the warning is a "fake warning" and in fact nothing is broken. So feel free to use any version of transformers and safely ignore the warning message 🤗

Motoroller89 commented 1 month ago

Just a heads up, a fix PR is already on its way. For anyone who faces the same problem, the warning is a "fake warning" and in fact nothing is broken. So feel free to use any version of transformers and safely ignore the warning message 🤗

can you give a link to the RP? to see when the problem will be solved this problem 'No module named 'transformers.models.mllama'

zucchini-nlp commented 1 month ago

33753 was merged on main, try installing from source !pip install --upgrade git+https://github.com/huggingface/transformers.git

The mllama problem is prob due to transformers version, as the model was added in the latest release. So any prev version will throw that error

ArthurZucker commented 1 month ago

Patch will come out later today!