Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
I tried finetuning the Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4 model with swift sft, and the model can be successfully loaded for inference using swift infer. However, when loading the model by PEFT, it raises the error:
ValueError: Target module Qwen2VLModel(
(embed_tokens): Embedding(151936, 1536)
(layers): ModuleList(
(0-27): 28 x Qwen2VLDecoderLayer(
(self_attn): Qwen2VLSdpaAttention(
(rotary_emb): Qwen2VLRotaryEmbedding()
(k_proj): QuantLinear()
(o_proj): QuantLinear()
(q_proj): QuantLinear()
(v_proj): QuantLinear()
)
(mlp): Qwen2MLP(
(act_fn): SiLU()
(down_proj): QuantLinear()
(gate_proj): QuantLinear()
(up_proj): QuantLinear()
)
(input_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
(post_attention_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
)
)
(norm): Qwen2RMSNorm((1536,), eps=1e-06)
(rotary_emb): Qwen2VLRotaryEmbedding()
) is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.
In the meantime, using swift export with merge_lora = True also won't help, as merge_lora cannot be enabled when the base model is quantized. Is there any way that I can load the finetuned, quantized model without using the swift library (e.g., by using peft only) for inference?
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
CUDA version: 12.1
GPU: 2x RTX 4090
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) I tried finetuning the
Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4
model withswift sft
, and the model can be successfully loaded for inference usingswift infer
. However, when loading the model by PEFT, it raises the error:In the meantime, using
swift export
withmerge_lora = True
also won't help, asmerge_lora
cannot be enabled when the base model is quantized. Is there any way that I can load the finetuned, quantized model without using the swift library (e.g., by using peft only) for inference?Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) CUDA version: 12.1 GPU: 2x RTX 4090
Additional context Add any other context about the problem here(在这里补充其他信息)