thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
268 stars 14 forks source link

The Vicuna LLM is not frozen during pretraining #24

Open ZJULiHongxin opened 2 months ago

ZJULiHongxin commented 2 months ago

Hello! Thank you for open-sourcing this great work. @yaoyuanTHU @guozonghao96 @xrorrim I tried pretraining and fine-tuning LLaVA-UHD but found a small error.

I calculated the number of trainable parameters of the LLM using this line of code:

    if model_args.freeze_backbone:
        model.model.requires_grad_(False)
    trainable_params_info["LLM_backbone"] = {
        "#params": sum(p.numel() for p in model.model.parameters()),
        "#trainable_params": sum(p.numel() for p in model.model.parameters()if p.requires_grad)
    }

When pretraining using pretrain.sh, the number of trainable parameters of the LLM is not 0 as stated in your paper "Stage 1: Pretraining details. During this stage, only the perceiver resampler is tuned".

Could you please clarify this small error? Thanks in advance.