Vision fine-tuning - Githubissues

microsoft / Phi-3CookBook

This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.

MIT License

2.31k stars 232 forks source link

Vision fine-tuning #63

Closed 2U1 closed 2 months ago

2U1 commented 3 months ago

This issue is for a: (mark with an `x`)

- [ ] bug report -> please search issues before submitting
- [X] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Mention any other details that might be useful

When fine-tuning the vision-model I think it's possible for fine-tuning the vision model with non-lora and fine-tune the language model with lora.

I've made a code for this by borrowing code from llava. I hope this could be helpful for updating the training script for fine-tuning vision model.

https://github.com/2U1/Phi3-Vision-ft

leestott commented 3 months ago

@2U1 Hi would like make a pull request with you addition?

franperezlopez commented 3 months ago

@2U1 please :raised_hands:

2U1 commented 3 months ago

@leestott @franperezlopez Yes sure!

franperezlopez commented 2 months ago

I'd like to understand why the CLIP model can't be trained using LORA, as stated in this comment https://github.com/microsoft/Phi-3CookBook/blob/20d56d79cfd38eb175118ecc961a9b49e2341de2/code/04.Finetuning/vision_finetuning/finetune_hf_trainer_docvqa.py#L94

I made myself a lora_config based on this code, and so far, it worked

    linear_modules = [
        # CLIP modules
        'q_proj',  # attention
        'k_proj',
        'v_proj',
        'out_proj',
        'fc1',  # MLP
        'fc2',
        # 'img_projection.0',
        # 'img_projection.2',
        # FIXME: can't lora CLIP is a known issue of Phi-3-V
        # Phi language modules
        'qkv_proj',  # attention
        'o_proj',
        'down_proj',  # MLP
        'gate_up_proj',
        # 'lm_head',
    ]
    lora_config = LoraConfig(
        r=rank,
        lora_alpha=round(rank * alpha_to_rank_ratio),
        lora_dropout=dropout,
        target_modules=linear_modules,
        init_lora_weights='gaussian',
        task_type=TaskType.CAUSAL_LM,
        modules_to_save=["lm_head"],
    )

2U1 commented 2 months ago

I was a bit busy doing some other works. I'm a bit struggling with making my codes in one script that I've changed some codes in the processor and image_embedding. I'll make it asap.

leestott commented 2 months ago

@ChenRocks

Interesting comment:

I'd like to understand why the CLIP model can't be trained using LORA, as stated in this comment

https://github.com/microsoft/Phi-3CookBook/blob/20d56d79cfd38eb175118ecc961a9b49e2341de2/code/04.Finetuning/vision_finetuning/finetune_hf_trainer_docvqa.py#L94

I made myself a lora_config based on this code, and so far, it worked
    linear_modules = [
        # CLIP modules
        'q_proj',  # attention
        'k_proj',
        'v_proj',
        'out_proj',
        'fc1',  # MLP
        'fc2',
        # 'img_projection.0',
        # 'img_projection.2',
        # FIXME: can't lora CLIP is a known issue of Phi-3-V
        # Phi language modules
        'qkv_proj',  # attention
        'o_proj',
        'down_proj',  # MLP
        'gate_up_proj',
        # 'lm_head',
    ]
    lora_config = LoraConfig(
        r=rank,
        lora_alpha=round(rank * alpha_to_rank_ratio),
        lora_dropout=dropout,
        target_modules=linear_modules,
        init_lora_weights='gaussian',
        task_type=TaskType.CAUSAL_LM,
        modules_to_save=["lm_head"],
    )

ChenRocks commented 2 months ago

on latest main branch this is no longer a limitation

2U1 commented 2 months ago

@leestott @ChenRocks Oh I was too late for this. I'll close the issue, becuase it will be too much for a cookbook when changing the other things I've made.

Thanks for updating the code :) !

microsoft / Phi-3CookBook

Vision fine-tuning #63

This issue is for a: (mark with an x)

Mention any other details that might be useful

This issue is for a: (mark with an `x`)