Closed lonngxiang closed 2 months ago
好的谢谢,这份代码4090卡微调报错: 4090都不能微调2b模型? python finetune.py
显存太少了.... 我记得 我用bf16, batch size 1,显存要用到40GB左右。
Adamw 可以换成 SGD 试试, 应该也会少用很多显存。 另外,还可以 把这行代码中的 processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", min_pixels=2562828, max_pixels=5122828, padding_side="right")
256和512改成更小的数,也就是图片会被压缩的更狠一点
latest of this repo could finetune with flash-attention-2, which could save memory. try it. I close this issue then.
model
Qwen2-VL 的接口好像没看到vision_model、language_model
# Qwen2VLForConditionalGeneration(
# (visual): Qwen2VisionTransformerPretrainedModel(
# (patch_embed): PatchEmbed(
# (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
# )
# (rotary_pos_emb): VisionRotaryEmbedding()
# (blocks): ModuleList(
# (0-31): 32 x Qwen2VLVisionBlock(
# (norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
# (norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
# (attn): VisionSdpaAttention(
# (qkv): Linear(in_features=1280, out_features=3840, bias=True)
# (proj): Linear(in_features=1280, out_features=1280, bias=True)
# )
# (mlp): VisionMlp(
# (fc1): Linear(in_features=1280, out_features=5120, bias=True)
# (act): QuickGELUActivation()
# (fc2): Linear(in_features=5120, out_features=1280, bias=True)
# )
# )
# )
# (merger): PatchMerger(
# (ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
# (mlp): Sequential(
# (0): Linear(in_features=5120, out_features=5120, bias=True)
# (1): GELU(approximate='none')
# (2): Linear(in_features=5120, out_features=1536, bias=True)
# )
# )
# )
# (model): Qwen2VLModel(
# (embed_tokens): Embedding(151936, 1536)
# (layers): ModuleList(
# (0-27): 28 x Qwen2VLDecoderLayer(
# (self_attn): Qwen2VLSdpaAttention(
# (q_proj): Linear(in_features=1536, out_features=1536, bias=True)
# (k_proj): Linear(in_features=1536, out_features=256, bias=True)
# (v_proj): Linear(in_features=1536, out_features=256, bias=True)
# (o_proj): Linear(in_features=1536, out_features=1536, bias=False)
# (rotary_emb): Qwen2RotaryEmbedding()
# )
# (mlp): Qwen2MLP(
# (gate_proj): Linear(in_features=1536, out_features=8960, bias=False)
# (up_proj): Linear(in_features=1536, out_features=8960, bias=False)
# (down_proj): Linear(in_features=8960, out_features=1536, bias=False)
# (act_fn): SiLU()
# )
# (input_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
# (post_attention_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
# )
# )
# (norm): Qwen2RMSNorm((1536,), eps=1e-06)
# )
# (lm_head): Linear(in_features=1536, out_features=151936, bias=False)
# )
above is print of qwen2-vl model.
you can use model.visual or model.visual.patch_embed to access sub-modules of the model and set requiresgrad
(Pdb++) model
# Qwen2VLForConditionalGeneration( # (visual): Qwen2VisionTransformerPretrainedModel( # (patch_embed): PatchEmbed( # (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False) # ) # (rotary_pos_emb): VisionRotaryEmbedding() # (blocks): ModuleList( # (0-31): 32 x Qwen2VLVisionBlock( # (norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True) # (norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True) # (attn): VisionSdpaAttention( # (qkv): Linear(in_features=1280, out_features=3840, bias=True) # (proj): Linear(in_features=1280, out_features=1280, bias=True) # ) # (mlp): VisionMlp( # (fc1): Linear(in_features=1280, out_features=5120, bias=True) # (act): QuickGELUActivation() # (fc2): Linear(in_features=5120, out_features=1280, bias=True) # ) # ) # ) # (merger): PatchMerger( # (ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True) # (mlp): Sequential( # (0): Linear(in_features=5120, out_features=5120, bias=True) # (1): GELU(approximate='none') # (2): Linear(in_features=5120, out_features=1536, bias=True) # ) # ) # ) # (model): Qwen2VLModel( # (embed_tokens): Embedding(151936, 1536) # (layers): ModuleList( # (0-27): 28 x Qwen2VLDecoderLayer( # (self_attn): Qwen2VLSdpaAttention( # (q_proj): Linear(in_features=1536, out_features=1536, bias=True) # (k_proj): Linear(in_features=1536, out_features=256, bias=True) # (v_proj): Linear(in_features=1536, out_features=256, bias=True) # (o_proj): Linear(in_features=1536, out_features=1536, bias=False) # (rotary_emb): Qwen2RotaryEmbedding() # ) # (mlp): Qwen2MLP( # (gate_proj): Linear(in_features=1536, out_features=8960, bias=False) # (up_proj): Linear(in_features=1536, out_features=8960, bias=False) # (down_proj): Linear(in_features=8960, out_features=1536, bias=False) # (act_fn): SiLU() # ) # (input_layernorm): Qwen2RMSNorm((1536,), eps=1e-06) # (post_attention_layernorm): Qwen2RMSNorm((1536,), eps=1e-06) # ) # ) # (norm): Qwen2RMSNorm((1536,), eps=1e-06) # ) # (lm_head): Linear(in_features=1536, out_features=151936, bias=False) # )
above is print of qwen2-vl model.
you can use model.visual or model.visual.patch_embed to access sub-modules of the model and set requiresgrad
那对于语音模型是哪个模块呢
这份代码 微调所有的代码。 如果只想微调一部分代码, 可以 简单写几行代码, 把一些参数 固定住,不更新就行了。 例如 这个文件中的几行代码 https://github.com/zhangfaen/finetune-InternVL2/blob/main/train.py