Open hills-code opened 2 months ago
Hi, clip features are only for understanding.
So it means this model can not generate and can only do understanding tasks?
This model can also generate images. Generation does not need any image inputs and just uses [mask] tokens as input.
Thanks for your great work!
I want to know can the show-o+ in this table generate images or it just serves the understanding tasks.