showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
https://arxiv.org/abs/2408.12528
Apache License 2.0
1.03k stars 44 forks source link

Evaluation on NLP tasks and training time #36

Open KebinWu opened 1 month ago

KebinWu commented 1 month ago

Thanks for the nice work. I would appreciate your kind answers on the following two questions.

  1. how long does it take to finish the whole training (three stages)? and what's the training time for each stage?
  2. Since you are also adding RefinedWeb to the model training, do you also have the evaluation on NLP tasks and then compare it with the Phi-1.5 model? I guess there will be some performance drops, but just wonder how much the decrease can be.

Looking forward to your reply, thank you.

Sierkinhane commented 1 month ago

Hi, it takes around three weeks. As we focus on multimodal understanding and generation, we haven't evaluated Show-o on NLP tasks. :)