yfzhang114 / SliME

✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Apache License 2.0
134 stars 7 forks source link

Should these two lines be uncommented during training? #1

Closed zjysteven closed 3 months ago

zjysteven commented 3 months ago

Hi,

Thanks for open-sourcing this awesome work. I was wondering if during training the below two lines should be uncommented, as otherwise the answer text embeddings will also be used for the router? https://github.com/yfzhang114/SliME/blob/f019bb32de38ad1c7ab28f4b203c5a0f058796e2/llava/model/llava_arch.py#L165-L166

Thanks

yfzhang114 commented 3 months ago

Thanks for your interest in our project and for reaching out!

Yes, strictly speaking, the answer text embeddings should not be used for the router during training. However, through our experimentation, we found that there is no significant difference when making this change. We hypothesize that question and answer pairs often exhibit similar patterns, which led us to skip the mask during training.