Finetuned Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 cannot be loaded with PEFT #2203

Open ROIM1998 opened 3 days ago

ROIM1998 commented 3 days ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) I tried finetuning the Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 model with swift sft, and the model can be successfully loaded for inference using swift infer. However, when loading the model by PEFT, it raises the error:

ValueError: Target module Qwen2VLModel(
  (embed_tokens): Embedding(151936, 1536)
  (layers): ModuleList(
    (0-27): 28 x Qwen2VLDecoderLayer(
      (self_attn): Qwen2VLSdpaAttention(
        (rotary_emb): Qwen2VLRotaryEmbedding()
        (k_proj): QuantLinear()
        (o_proj): QuantLinear()
        (q_proj): QuantLinear()
        (v_proj): QuantLinear()
      (mlp): Qwen2MLP(
        (act_fn): SiLU()
        (down_proj): QuantLinear()
        (gate_proj): QuantLinear()
        (up_proj): QuantLinear()
      (input_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
      (post_attention_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
  (norm): Qwen2RMSNorm((1536,), eps=1e-06)
  (rotary_emb): Qwen2VLRotaryEmbedding()
) is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.

In the meantime, using swift export with merge_lora = True also won't help, as merge_lora cannot be enabled when the base model is quantized. Is there any way that I can load the finetuned, quantized model without using the swift library (e.g., by using peft only) for inference?

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) CUDA version: 12.1 GPU: 2x RTX 4090

Jintao-Huang commented 2 days ago

Please use Swift.from_pretrained.