showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
https://arxiv.org/abs/2408.12528
Apache License 2.0
1.04k stars 44 forks source link

Request for external caption files #52

Open mmderakhshani opened 1 week ago

mmderakhshani commented 1 week ago

Hi there,

Thank you for sharing this excellent GitHub repository.

The Laiona-aesthetic-12m and JourneyDB datasets have been recaptioned using the ShareGPT4V model in both the second and third stages of training.

We are working on reproducing your results and have successfully completed the first stage. To continue with the training, we would like to request the following three annotations:

external_journeydb_caption_path: "/mnt/bn/vgfm2/test_mlx/xavier/code/3062/open_muse/train_journeydb_anno.json"

external_laion12m_caption_path: "/mnt/bn/vgfm/laion5b/laion-aesthetics-12m-captions"

and

/mnt/bn/vgfm2/test_dit/LlmDiffuser_phi1.5/LlmDiffuser/questions.json

Could you please share these items with us as they are blocking our reproduction of your GitHub repo?

If sharing these files is not possible, could you provide the code to regenerate them at least? This way, we can handle the recaptioning internally. Much appreciated.

Sierkinhane commented 1 week ago

Hi, you can find journeydb annotation here and questions.json in the directory ./training. For laion12m, you can recaption it using the off-the-shelf MLLMs like Qwen series or ShareGPT-V.

mmderakhshani commented 1 week ago

Perfect, thanks a lot for this. Could you please let me know what your prompt is for recaptioning?

Sierkinhane commented 1 week ago

Hi, maybe you can try "Describe this image and its style in a very detailed manner” or “Describe this image in as much detail as possible”.

mmderakhshani commented 6 days ago

Perfect. Thanks for this. I will try and get back to you if you do not mind.

Sierkinhane commented 6 days ago

Feel free to ask :)