showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
https://arxiv.org/abs/2408.12528
Apache License 2.0
1.04k stars 44 forks source link

Does show-o support multimodal-in multimodal-out? #27

Open URRealHero opened 2 months ago

URRealHero commented 2 months ago

Like what I said, does it support the title? does it multimodal-in, multimodal-out(with multi images)?

URRealHero commented 2 months ago

I've noticed that Mixed-modal generation is a pending requested feature, so can I write script to do that? or the model do not have the ability now.

Sierkinhane commented 2 months ago

Hi, we have explored mixed-modality generation. However, we still have not uploaded such a pre-trained weight to this version. We consider it in the next update but we are not sure about the timeline.

URRealHero commented 2 months ago

thx very much!so is this process instruction tunning part or I have to pretrain it again?

Sierkinhane commented 2 months ago

Hey, it is included in the instruction tuning stage.

URRealHero commented 2 months ago

Thx a lot! I'll try to do that~

URRealHero commented 2 months ago

Hi there, I don't understand how to use a new dataset to finetune the model. 1.Do I need to pretrain from the beginning to get the checkpoint for stage2? if not, where can I get the pretrained params

  1. For a new dataset, do I have to create a similar instruction-tuning yaml like yours for LLava-tuning?