training detail - Githubissues

Hi. Thank you for your good work. I have a question about the training details. So I saw in the code that you use the Seq2SeqTrainer class from huggingface. It seems that you used simple cross-entropy loss for your model like other MLLMs. Is it right?

If the target is "A man[0.220,0.216,0.568,0.830] holding roses[0.404,0.374,0.588,0.758] and a woman[0.606,0.250,0.812,0.830] covering her mouth[0.612,0.358,0.666,0.414].", then the model is just trained by teacher forcing with the target?

shikras / shikra

training detail #51