Hello, thank you for creating such a great repository. I would like to tune the image encoder as well, since I want to train the model in a completely different domain. Could you possibly give me any hints on how to approach this?
Here are my detailed questions:
Can I tune the model by simply modifying the requires_grad_ in load_model and removing the torch.no_grad decorator in the forward method within llava/model/multimodal_encoder/clip_encoder.py?
To save and load the trained image encoder, should I modify _save_checkpoint in llava/train/llava_trainer.py to save it, and modify load_model in CLIPVisionTower to load it?
Question
Hello, thank you for creating such a great repository. I would like to tune the image encoder as well, since I want to train the model in a completely different domain. Could you possibly give me any hints on how to approach this?
Here are my detailed questions:
requires_grad_
inload_model
and removing thetorch.no_grad
decorator in theforward
method withinllava/model/multimodal_encoder/clip_encoder.py
?_save_checkpoint
inllava/train/llava_trainer.py
to save it, and modifyload_model
inCLIPVisionTower
to load it?I'm wondering if I'm missing anything. Thank you.