fine-tuning the mmprojector

lxr-1204 commented 1 month ago

Thank you for your outstanding work, which has allowed me to quickly start my fine-tuning process. However, I have the following two questions:

In the LoRA fine-tuning of the LLaVA series, most fine-tuning involves applying LoRA to the LLM while fully fine-tuning the mmprojector. However, in your work, I didn't seem to find parameters for controlling adjustments to the mmprojector.
I have previously tried various fine-tuning codes, including LLaVA-Next, using 4 L20 (48G) GPUs. In these 7B models, my LoRA parameters were set as r=128, alpha=256, maxlength=8096. However, in your project, when I fine-tune llava-interleave-qwen-7b-hf, I can only set r=8, alpha=8, maxlength=1024; otherwise, it results in an OOM error.

zjysteven commented 1 month ago

Currently we only support full finetuning of mmprojector, while LLM can be fully or LORA finetuned.
This repo is based on huggingface (HF) implementations of all the models. One caveat of HF implementations is that they did not count vision tokens into maxlength, which is different from the original implementation of LLaVA. This means that, for example for LLaVA-1.5, maxlength=100 in HF implementation is actually equivalent to maxlength=100 + 576 in the original implementation. This would be the first reason that may have caused OOM if you were using a very large maxlength. In more recent transformers library releases it starts to count vision tokens into maxlength. However I haven't got chance to reflect this update in the repo.

When I'm available (hopefully in the next 2-3 weeks) I will make a major refactor regarding problem 2.

lxr-1204 commented 1 month ago

🎉 Thank you for addressing my questions so promptly. I look forward to seeing even more outstanding results from you.

zjysteven commented 1 month ago

Regarding point 2, I have made the updates, but there could be some caveats that one should be careful about, which is thoroughly discussed in #43.

Closing now. Feel free to reopen if needed.

zjysteven / lmms-finetune