xichenpan / ARLDM

Official Pytorch Implementation of Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
https://arxiv.org/abs/2211.10950
MIT License
182 stars 28 forks source link

dataset for VIST #3

Closed kriskrisliu closed 1 year ago

kriskrisliu commented 1 year ago

Thanks for opensource! I'm focusing on reproducing this work and notice that datasets include both VIST-SIS and VIST-DII. However the download script (vist_img_download.py) only suggest to download DII, while vist_hdf5.py only converts SIS.

My question is how to organize VIST-SIS/DII for training in detail.

xichenpan commented 1 year ago

Hi, vist-dii and vist-sis shares the same images but different captions. The vist_img_download.py file use only dii jsons, but it downloads the needed images for both vist-sis and vist-dii. vist_hdf5.py file also save both sis and dii captions into one hdf5 file, you can access them through sis and dii keys, as shwon in https://github.com/Flash-321/ARLDM/tree/main/data_script#L97-L100.

kriskrisliu commented 1 year ago

Cool!