rubenvillegas / iclr2017mcnet

Tensorflow implementation of the ICLR 2017 paper: Decomposing Motion and Content for Natural Video Sequence Prediction
https://sites.google.com/a/umich.edu/rubenevillegas/iclr2017
MIT License
109 stars 34 forks source link

Input data #4

Closed leesunfreshing closed 7 years ago

leesunfreshing commented 7 years ago

Hi, according to the paper, the input are the feature maps till the 3rd pooling layer from VGG-16. I understand the motion_enc serves this purpose. I am wondering the pre-trained model includes the weights from pretrained VGG-16 on imagenet or you train it from scratch? Thanks for your time

rubenvillegas commented 7 years ago

Hi @leesunfreshing ,

Thanks for your question. All the models are trained from scratch. In the paper we mean that we use the VGG-16 architecture, but not the pre-trained weights. Please let me know if you have more questions.

Regards, Ruben

leesunfreshing commented 7 years ago

Thx 4 your reply! I do have one more question. Does The L_GAN and L_img are both computed frame by frame, and then sum up? Just like the Deep multi scale paper, which uses the same loss function? Thx again.

rubenvillegas commented 7 years ago

L_img is computed frame by frame and then summed up. L_GAN is computed on the entire sequence.