About checkpoints to be used by finetune

showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Apache License 2.0

1.04k stars 44 forks source link

Hello! I am very interested in your work and see that you release the weight of Show-o before fine-tuning on LLaVA instructional tuning datasets.

I have the following two questions:

I see that you recommend in the README to go to finetune on the basis of the show-o-512x512-wo-llava-tuning checkpoint, so why don't go to finetune on the basis of the show-o-512x512. Is it because there is a performance degradation on certain downstream tasks after fine-tuning on LLaVA instructional tuning datasets?
If I want to fine-tune on certain visual downstream tasks, which checkpoint should I use?

showlab / Show-o