wangyuchi369 / LaDiC

[NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?
https://arxiv.org/pdf/2404.10763.pdf
34 stars 1 forks source link

about datasets/mean_emb_split.pickle and datasets/std_emb_split.pickle file #1

Open wanghua-lei opened 4 months ago

wanghua-lei commented 4 months ago

Could you please upload datasets/mean_emb_split.pickle and datasets/std_emb_split.pickle file?

wangyuchi369 commented 4 months ago

Hi @wanghua-lei, we've uploaded the relevant files to the datasets/ folder. Thanks for bringing this issue to our attention.

Additionally, it's worth noting that the mean and std are influenced by various factors, including the dataset itself, the text encoder used, or even the specific subset you select. Therefore, we also strongly recommend that you gather a subset of your dataset, extract the corresponding text features, and calculate the results based on your specific configuration.

wanghua-lei commented 4 months ago

thank you!

wanghua-lei commented 3 months ago

Hello, author. When I use the default configuration, I often get results that are just "." during inference. Could you share your pre-trained weights for fine-tuning?

wangyuchi369 commented 3 months ago

Hi, @wanghua-lei . We have uploaded a version of our pre-trained weights in the 'Pretrained models' section for fine-tuning. Additionally, we have updated some configurations in our uploaded files, so it's recommended that you repull the repository.

wanghua-lei commented 3 months ago

Thank you a lot. I apply for the drive.google.com.