showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
https://arxiv.org/abs/2408.12528
Apache License 2.0
1.04k stars 44 forks source link

input_ids_mmu not appended to input_ids in train_w_clip_vit.py #39

Closed lzn87 closed 1 month ago

lzn87 commented 1 month ago

Hi,

First off, thank you, authors, for open-sourcing this model! I am currently working on fine-tuning show-o on llava-style data. I noticed that input_ids_mmu is appended to input_ids for the non-clip version (see: https://github.com/showlab/Show-o/blob/7ce44993ef7f8b46c8fa374339beef17dc572033/training/train.py#L582C47-L582C60), but not for the clip-vit show-o. I am wondering if this is expected, and if so, why?

Thank you very much in advance!