xiaoachen98 / Open-LLaVA-NeXT

An open-source implementation for training LLaVA-NeXT.
398 stars 20 forks source link

[Question] About finetuning projector #2

Closed JY-CCK closed 6 months ago

JY-CCK commented 6 months ago

Hello. First of all, thanks for providing LLaVA-Next training code.

I have a question. In the readme file, you recommend to finetune the entire model. Also, in the train.py, it tries to train the entire model according to print log.

if training_args.unfreeze_mm_vision_tower:
        lr_of_vit = training_args.mm_vision_tower_lr if training_args.mm_vision_tower_lr is not None else training_args.learning_rate
        lr_of_mlp = training_args.mm_projector_lr if training_args.mm_projector_lr is not None else training_args.learning_rate
        training_args.mm_projector_lr = lr_of_mlp
        unfreeze_vit(vision_tower)
        rank0_print(
            f'Tune the entire model! The LR of ViT is {lr_of_vit}. The LR of MLP is {lr_of_mlp}. The LR of LLM is {training_args.learning_rate}')

But, in your script, especially in finetune.sh, there is no 'tune_mm_mlp_adapte True'.

What is the right way to finetune llava models?

Thanks!

xiaoachen98 commented 6 months ago

Hello. First of all, thanks for providing LLaVA-Next training code.

I have a question. In the readme file, you recommend to finetune the entire model. Also, in the train.py, it tries to train the entire model according to print log.

if training_args.unfreeze_mm_vision_tower:
        lr_of_vit = training_args.mm_vision_tower_lr if training_args.mm_vision_tower_lr is not None else training_args.learning_rate
        lr_of_mlp = training_args.mm_projector_lr if training_args.mm_projector_lr is not None else training_args.learning_rate
        training_args.mm_projector_lr = lr_of_mlp
        unfreeze_vit(vision_tower)
        rank0_print(
            f'Tune the entire model! The LR of ViT is {lr_of_vit}. The LR of MLP is {lr_of_mlp}. The LR of LLM is {training_args.learning_rate}')

But, in your script, especially in finetune.sh, there is no 'tune_mm_mlp_adapte True'.

What is the right way to finetune llava models?

Thanks!

You don't need to set the tune_mm_mlp_adapter argument at fine-tune stage, the default behavior sets both the projector and the LLM to be trainable. You can debug to verify this behavior.