[Question] How do I train the image encoder?

Ryoo72 commented 7 months ago

Question

Hello, thank you for creating such a great repository. I would like to tune the image encoder as well, since I want to train the model in a completely different domain. Could you possibly give me any hints on how to approach this?

Here are my detailed questions:

Can I tune the model by simply modifying the requires_grad_ in load_model and removing the torch.no_grad decorator in the forward method within llava/model/multimodal_encoder/clip_encoder.py?
To save and load the trained image encoder, should I modify _save_checkpoint in llava/train/llava_trainer.py to save it, and modify load_model in CLIPVisionTower to load it?

I'm wondering if I'm missing anything. Thank you.

GohioAC commented 5 months ago

@Ryoo72 did you figure this out?

Ryoo72 commented 5 months ago

GohioAC

I think I did something, but I'm not sure

haotian-liu / LLaVA

[Question] How do I train the image encoder? #1147

Question