sergeytulyakov / mocogan

MoCoGAN: Decomposing Motion and Content for Video Generation
578 stars 114 forks source link

Inception Score on UCF101 #16

Closed vladyushchenko closed 6 years ago

vladyushchenko commented 6 years ago

Hi,

I am trying to reproduce the Inception score results on UCF101 dataset. Could you please point out, which model and parameters(number of generated videos, splits) were used for stated result? Did you use implementation from TGAN paper or other repository?

Thanks in advance!

sergeytulyakov commented 6 years ago

Hi Vlad,

Are you asking about the "inception-score" network? If yes, then I used the TGAN's chainer code for computing it.

Sergey

vladyushchenko commented 6 years ago

Hi Sergey,

I have trained MoCoGAN on UCF101(train split only) using following parameters: {'--batches': '100000', '--dim_z_category': '101', '--dim_z_content': '50', '--dim_z_motion': '10', '--every_nth': '2', '--image_batch': '32', '--image_dataset': '', '--image_discriminator': 'PatchImageDiscriminator', '--image_size': '64', '--n_channels': '3', '--noise_sigma': '0.1', '--print_every': '1000', '--use_categories': False, '--use_infogan': True, '--use_noise': True, '--video_batch': '32', '--video_discriminator': 'CategoricalVideoDiscriminator', '--video_length': '16', '': '/fastdata/UCF101/transformed_train'}

I get the IS 11.0 for model after 100000 iterations, however reported IS in paper is around 12.4. Could you please comment on whether you used entire UCF dataset and if any of listed parameters differ?

Vlad

sergeytulyakov commented 6 years ago

As far as I remember, we didn't compare on categorical video generation. So dim_z_category should be 0, I used PatchVideoDiscriminator to get this score.

MannyKayy commented 5 years ago

@VladYushchenko were you able to replicate the IS score? I am also getting an IS of around 11.

mkhodabandeh commented 5 years ago

@MannyKayy were you able to replicate the other results of the paper?

MannyKayy commented 5 years ago

@mkhodabandeh Which results are you referring to specifically?

KiBeomHong commented 5 years ago

@VladYushchenko Thanks for sharing your results. Could you let me know how to pre-process the input frames about UCF-101 dataset? Or could you share the transformed-UCF101 dataset..? Thanks

vladyushchenko commented 5 years ago

@KiBeomHong I have used code from this repo. Basically, you need to download UCF101.zip, unpack it and launch 2 python scripts sequentially. You need ffmpeg tool for that, please also check the split you need in ucfTrainTestlist and whether the names are correctly spelled.

Also, you might want to make some changes in the scripts and change filenames and/or the image extension.

vladyushchenko commented 5 years ago

@MannyKayy Actually I got pretty close, but could not replicate the reported value in the paper. It really depends a lot on the hyperparameters used for the training and on the IS evaluation protocol in the TGAN.

What partially helped me - I have extracted data in PNG format and not in JPG