zhangfaen commented 2 months ago

这份代码微调所有的代码。如果只想微调一部分代码，可以简单写几行代码，把一些参数固定住，不更新就行了。例如这个文件中的几行代码 https://github.com/zhangfaen/finetune-InternVL2/blob/main/train.py

model.vision_model.requires_grad_(False) # 冻结这module中的参数， 不更新... 
model.language_model.requires_grad_(False)  # 冻结这module中的参数， 不更新... 

logger.info(f"total params for Lora training: {sum(p.numel() for p in model.parameters())}")
logger.info(f"total trainable params for Lora training: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")

optimizer = AdamW(model.parameters(), lr=lr)

lonngxiang commented 2 months ago

好的谢谢，这份代码4090卡微调报错： 4090都不能微调2b模型？ python finetune.py

zhangfaen commented 2 months ago

显存太少了.... 我记得我用bf16， batch size 1，显存要用到40GB左右。

zhangfaen commented 2 months ago

Adamw 可以换成 SGD 试试，应该也会少用很多显存。另外，还可以把这行代码中的 processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", min_pixels=2562828, max_pixels=5122828, padding_side="right")

256和512改成更小的数，也就是图片会被压缩的更狠一点

zhangfaen commented 2 months ago

latest of this repo could finetune with flash-attention-2, which could save memory. try it. I close this issue then.

lonngxiang commented 2 months ago

model

Qwen2-VL 的接口好像没看到vision_model、language_model

zhangfaen commented 2 months ago

(Pdb++) model

# Qwen2VLForConditionalGeneration(
#   (visual): Qwen2VisionTransformerPretrainedModel(
#     (patch_embed): PatchEmbed(
#       (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
#     )
#     (rotary_pos_emb): VisionRotaryEmbedding()
#     (blocks): ModuleList(
#       (0-31): 32 x Qwen2VLVisionBlock(
#         (norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
#         (norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
#         (attn): VisionSdpaAttention(
#           (qkv): Linear(in_features=1280, out_features=3840, bias=True)
#           (proj): Linear(in_features=1280, out_features=1280, bias=True)
#         )
#         (mlp): VisionMlp(
#           (fc1): Linear(in_features=1280, out_features=5120, bias=True)
#           (act): QuickGELUActivation()
#           (fc2): Linear(in_features=5120, out_features=1280, bias=True)
#         )
#       )
#     )
#     (merger): PatchMerger(
#       (ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
#       (mlp): Sequential(
#         (0): Linear(in_features=5120, out_features=5120, bias=True)
#         (1): GELU(approximate='none')
#         (2): Linear(in_features=5120, out_features=1536, bias=True)
#       )
#     )
#   )
#   (model): Qwen2VLModel(
#     (embed_tokens): Embedding(151936, 1536)
#     (layers): ModuleList(
#       (0-27): 28 x Qwen2VLDecoderLayer(
#         (self_attn): Qwen2VLSdpaAttention(
#           (q_proj): Linear(in_features=1536, out_features=1536, bias=True)
#           (k_proj): Linear(in_features=1536, out_features=256, bias=True)
#           (v_proj): Linear(in_features=1536, out_features=256, bias=True)
#           (o_proj): Linear(in_features=1536, out_features=1536, bias=False)
#           (rotary_emb): Qwen2RotaryEmbedding()
#         )
#         (mlp): Qwen2MLP(
#           (gate_proj): Linear(in_features=1536, out_features=8960, bias=False)
#           (up_proj): Linear(in_features=1536, out_features=8960, bias=False)
#           (down_proj): Linear(in_features=8960, out_features=1536, bias=False)
#           (act_fn): SiLU()
#         )
#         (input_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
#         (post_attention_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
#       )
#     )
#     (norm): Qwen2RMSNorm((1536,), eps=1e-06)
#   )
#   (lm_head): Linear(in_features=1536, out_features=151936, bias=False)
# )

above is print of qwen2-vl model.

you can use model.visual or model.visual.patch_embed to access sub-modules of the model and set requiresgrad

lonngxiang commented 2 months ago

(Pdb++) model

# Qwen2VLForConditionalGeneration(
#   (visual): Qwen2VisionTransformerPretrainedModel(
#     (patch_embed): PatchEmbed(
#       (proj): Conv3d(3, 1280, kernel_size=(2, 14, 14), stride=(2, 14, 14), bias=False)
#     )
#     (rotary_pos_emb): VisionRotaryEmbedding()
#     (blocks): ModuleList(
#       (0-31): 32 x Qwen2VLVisionBlock(
#         (norm1): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
#         (norm2): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
#         (attn): VisionSdpaAttention(
#           (qkv): Linear(in_features=1280, out_features=3840, bias=True)
#           (proj): Linear(in_features=1280, out_features=1280, bias=True)
#         )
#         (mlp): VisionMlp(
#           (fc1): Linear(in_features=1280, out_features=5120, bias=True)
#           (act): QuickGELUActivation()
#           (fc2): Linear(in_features=5120, out_features=1280, bias=True)
#         )
#       )
#     )
#     (merger): PatchMerger(
#       (ln_q): LayerNorm((1280,), eps=1e-06, elementwise_affine=True)
#       (mlp): Sequential(
#         (0): Linear(in_features=5120, out_features=5120, bias=True)
#         (1): GELU(approximate='none')
#         (2): Linear(in_features=5120, out_features=1536, bias=True)
#       )
#     )
#   )
#   (model): Qwen2VLModel(
#     (embed_tokens): Embedding(151936, 1536)
#     (layers): ModuleList(
#       (0-27): 28 x Qwen2VLDecoderLayer(
#         (self_attn): Qwen2VLSdpaAttention(
#           (q_proj): Linear(in_features=1536, out_features=1536, bias=True)
#           (k_proj): Linear(in_features=1536, out_features=256, bias=True)
#           (v_proj): Linear(in_features=1536, out_features=256, bias=True)
#           (o_proj): Linear(in_features=1536, out_features=1536, bias=False)
#           (rotary_emb): Qwen2RotaryEmbedding()
#         )
#         (mlp): Qwen2MLP(
#           (gate_proj): Linear(in_features=1536, out_features=8960, bias=False)
#           (up_proj): Linear(in_features=1536, out_features=8960, bias=False)
#           (down_proj): Linear(in_features=8960, out_features=1536, bias=False)
#           (act_fn): SiLU()
#         )
#         (input_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
#         (post_attention_layernorm): Qwen2RMSNorm((1536,), eps=1e-06)
#       )
#     )
#     (norm): Qwen2RMSNorm((1536,), eps=1e-06)
#   )
#   (lm_head): Linear(in_features=1536, out_features=151936, bias=False)
# )

above is print of qwen2-vl model.

you can use model.visual or model.visual.patch_embed to access sub-modules of the model and set requiresgrad

那对于语音模型是哪个模块呢

zhangfaen / finetune-Qwen2-VL

这份代码微调不算lora微调？具体算是微调的哪层呢 #1

(Pdb++) model

(Pdb++) model