jy0205 / LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Other
438 stars 22 forks source link

Training model #1

Closed Haonote closed 7 months ago

Haonote commented 8 months ago

Hello, your model is impressive. I would like to ask what kind of GPU do I need to train this model and how many GPUs do I need?

jy0205 commented 8 months ago

Thanks for your attention, the model is trained on 256 A100 GPUs for 30 hours. You can use fewer GPUs (128 / 64) to train the model (needs more training time).

Haonote commented 8 months ago

@jy0205 Thanks for your prompt reply to let me know this information. I would also like to ask if the training code will be made public. If not, is it feasible for me to write the training code myself to fine-tune it?

xukunxkxk commented 8 months ago

Sorry, the pre-training code is not planned to be open-sourced currently. You can write the fine-tuning code to adapt the model for your specific needs.

yotofu commented 7 months ago

So Great Job! It will change the multi-modal paradigm, any plan to release the finetuning code, just the finetuning code like Qwen-VL?