junting / seg2vid

Pytorch implementation of "Video Generation from Single Semantic Label Map", CVPR 2019
138 stars 12 forks source link

Difficulty Reproducing Results #4

Open pat-hanbury opened 5 years ago

pat-hanbury commented 5 years ago

Hello,

I am trying to recreate the results on UCF-101 using the provided pretrained models for PlayingViolin and IceDancing. I am using the test_refine.py script for testing, but I am getting odd results:

Seg2Vid2

seg2vid-playingviolin

Seg2Vid3

The only thing in the code I changed to make the code run is: https://github.com/junting/seg2vid/blob/8240c8cae9ed3c12fdf0aa9c0432e4336dddbcd0/src/test_refine.py#L123

I had to change this line from z_m = Vb(z_noise.repeat(frame1.size()[0] * 8, 1)) to z_m = Vb(z_noise.repeat(frame1.size()[0] * 2 * 8, 1))

Without this change, I received an error complaining about the incorrect size of z_m.

Traceback (most recent call last):
  File "test_refine.py", line 147, in <module>
    a.test()
  File "test_refine.py", line 125, in test
    y_pred_before_refine, y_pred, mu, logvar, flow, flowback, mask_fw, mask_bw = vae(frame1, data, noise_bg, z_m)
  File "/home/hanburyp/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hanburyp/seg2vid/src/models/multiframe_genmask.py", line 262, in forward
    torch.cat([self.fc(z_m).view(-1, 64, int(opt.input_size[0] / 16), int(opt.input_size[1] / 16)), codex], 1))
RuntimeError: shape '[-1, 64, 16, 16]' is invalid for input of size 8192

Please advise on more details about UCF-101 inferences in order to reproduce your result. Thank you in advance!

junting commented 5 years ago

Hi, Thanks for your interest! I checked your GIF files, it seems that the size of the input image is 256x256, however, the provided pre-trained models were trained by using images of size 128x128. So could please run the inference again with images of a smaller size?

Also, we have also fixed the error you reported.