haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.59k stars 2.16k forks source link

[Question] How do I train the image encoder? #1147

Open Ryoo72 opened 7 months ago

Ryoo72 commented 7 months ago

Question

Hello, thank you for creating such a great repository. I would like to tune the image encoder as well, since I want to train the model in a completely different domain. Could you possibly give me any hints on how to approach this?

Here are my detailed questions:

  1. Can I tune the model by simply modifying the requires_grad_ in load_model and removing the torch.no_grad decorator in the forward method within llava/model/multimodal_encoder/clip_encoder.py?
  2. To save and load the trained image encoder, should I modify _save_checkpoint in llava/train/llava_trainer.py to save it, and modify load_model in CLIPVisionTower to load it?

I'm wondering if I'm missing anything. Thank you.

GohioAC commented 5 months ago

@Ryoo72 did you figure this out?

Ryoo72 commented 5 months ago

GohioAC

I think I did something, but I'm not sure