Closed lxr-1204 closed 1 month ago
maxlength
, which is different from the original implementation of LLaVA. This means that, for example for LLaVA-1.5, maxlength=100
in HF implementation is actually equivalent to maxlength=100 + 576
in the original implementation. This would be the first reason that may have caused OOM if you were using a very large maxlength. In more recent transformers library releases it starts to count vision tokens into maxlength
. However I haven't got chance to reflect this update in the repo.When I'm available (hopefully in the next 2-3 weeks) I will make a major refactor regarding problem 2.
🎉 Thank you for addressing my questions so promptly. I look forward to seeing even more outstanding results from you.
Regarding point 2, I have made the updates, but there could be some caveats that one should be careful about, which is thoroughly discussed in #43.
Closing now. Feel free to reopen if needed.
Thank you for your outstanding work, which has allowed me to quickly start my fine-tuning process. However, I have the following two questions:
In the LoRA fine-tuning of the LLaVA series, most fine-tuning involves applying LoRA to the LLM while fully fine-tuning the mmprojector. However, in your work, I didn't seem to find parameters for controlling adjustments to the mmprojector.
I have previously tried various fine-tuning codes, including LLaVA-Next, using 4 L20 (48G) GPUs. In these 7B models, my LoRA parameters were set as r=128, alpha=256, maxlength=8096. However, in your project, when I fine-tune llava-interleave-qwen-7b-hf, I can only set r=8, alpha=8, maxlength=1024; otherwise, it results in an OOM error.