google / e3d_lstm

e3d-lstm; Eidetic 3D LSTM A Model for Video Prediction and Beyond
Apache License 2.0
196 stars 57 forks source link

Model performance on KTH 10->20 task #1

Open kevinstan opened 5 years ago

kevinstan commented 5 years ago

Hello, thank you for the paper and releasing the code. I'm having difficulty reproducing the results for the KTH Action task in section 4.2. I've downloaded the pre-trained weights for KTH Actions (200,000 ckpt) and used it to test the model.

System Info python 2.7 opencv 4.1.0.25 tensorflow-gpu 1.9.0 CUDA 9.0 GPU: name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531 pciBusID: 0000:03:00.0 totalMemory: 11.91GiB freeMemory: 11.75GiB

script `

!/usr/bin/env bash

cd .. python -u run.py \ --is_training False \ --dataset_name action \ --train_data_paths data/kth \ --valid_data_paths data/kth \ --pretrained_model kth_e3d_lstm_pretrain/model.ckpt-200000 \ --save_dir checkpoints/_kth_e3d_lstm \ --gen_frm_dir results/_kth_e3d_lstm \ --model_name e3d_lstm \ --allow_gpu_growth True \ --img_channel 1 \ --img_width 128 \ --input_length 10 \ --total_length 30 \ --filter_size 5 \ --num_hidden 64,64,64,64 \ --patch_size 8 \ --layer_norm True \ --reverse_input False \ --sampling_stop_iter 100000 \ --sampling_start_value 1.0 \ --sampling_delta_per_iter 0.00001 \ --lr 0.001 \ --batch_size 2 \ --max_iterations 1 \ --display_interval 1 \ --test_interval 1 \ --snapshot_interval 5000 ` output (e3d_lstm_official) kstan@yixing:~/e3d_lstm/scripts$ ./e3d_lstm_kth_test.sh Initializing models 2019-05-15 14:37:16.852811: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-05-15 14:37:19.055412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531 pciBusID: 0000:03:00.0 totalMemory: 11.91GiB freeMemory: 11.75GiB 2019-05-15 14:37:19.055439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0 2019-05-15 14:37:19.262277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-05-15 14:37:19.262310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0 2019-05-15 14:37:19.262318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N 2019-05-15 14:37:19.262531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11376 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:03:00.0, compute capability: 6.1) load model: kth_e3d_lstm_pretrain/model.ckpt-200000 begin load datadata/kth there are 127271 pictures there are 5200 sequences begin load datadata/kth there are 74833 pictures there are 3167 sequences 2019-05-15 14:39:52 itr: 1 training loss: 16082.05078125 2019-05-15 14:39:52 test... mse per seq: 1853.1817014023088 96.02373807308271 80.29797137965903 84.68072711946989 83.75463825016179 84.48666421838448 84.61139482557209 85.35639578890967 86.27750272624341 87.66025201745674 89.2119170410002 90.84818150224523 92.64167446828084 94.38503250199183 96.13222195449993 98.02904253614453 99.92525694480216 101.83609684253146 103.8342688265889 105.73710226033657 107.45162212494725 psnr per frame: 23.111416 23.2865 23.752821 23.5958 23.57663 23.51337 23.477915 23.422129 23.364187 23.28756 23.209711 23.131495 23.047438 22.969624 22.893667 22.811342 22.732689 22.653484 22.571104 22.496899 22.43397 ssim per frame: 0.6098243 0.63740635 0.62530535 0.6226238 0.61893517 0.6169444 0.6149846 0.61348057 0.61197215 0.61037815 0.60889727 0.60745543 0.6060252 0.6047545 0.60347193 0.6020237 0.6007725 0.59954363 0.59822935 0.5971006 0.59618074

visual results gt11:gt11 gt12:gt12 gt13:gt13 gt14:gt14 gt15:gt15

pd11: pd11pd12: pd12pd13: pd13pd14: pd14pd15: pd15

...

It seems like the results are very different than what's presented in the paper -- what might I be doing wrong here?

Note: I've successfully reproduced the results and achieved the same SSIM and MSE on moving mnist task in section 4.1, so I don't think it's a system/hardware issue. So I think it could be possible that there is a mistake in the downloaded pretrained KTH action model.

Best, Kevin

Fangyh09 commented 5 years ago

@kevinstan I think the model is not the correct one. As the SSIM score is only 0.609. I retrained model MIM(another model) and the result is good. BTW, I find it very slow to train this model. Have you ever trained the model?

kevinstan commented 5 years ago

@Fangyh09 What is "model MIM"? Do you mean the Moving MNIST model? I've tried training the KTH Action model from scratch, and it doesn't take too long. I'm using 4 GPU and it should take 2-3 hrs for ~200,000 iters.

jhhuang96 commented 5 years ago

@kevinstan I confused about why Moving MNIST image shape (64,64,1) often should be reshaped (64//patch_size, 64//patch_size, patch_size**2 )? Is it because of gpu memory?

xiaomingdujin commented 4 years ago

I cannot get the result too.

@kevinstan I think the model is not the correct one. As the SSIM score is only 0.609. I retrained model MIM(another model) and the result is good. BTW, I find it very slow to train this model. Have you ever trained the model?

I also think the pretrained model is not the correct one.

xiaomingdujin commented 4 years ago

Maybe I find the reason, in rnn_cell.py, when calculating the outputgate, new global memory should be returned, but the code returns global memory,which is not updated, but even if I return new global_ memory, the result is even worse, so I suspect there is a problem in the transmission of time information

wyb15 commented 4 years ago

We noticed that there is a bug in the current code about "global_memory" which may cause for the mismatched pretrained models on the KTH dataset. As this code repo was reproduced after the first author left Google, this issue did not exist in our original experiments and the results reported in the paper are good. We are working on fixing this issue and refreshing our pre-trained KTH models. We apologize for the inconvenience and thank you for your patience.

toddwyl commented 4 years ago

We noticed that there is a bug in the current code about "global_memory" which may cause for the mismatched pretrained models on the KTH dataset. As this code repo was reproduced after the first author left Google, this issue did not exist in our original experiments and the results reported in the paper are good. We are working on fixing this issue and refreshing our pre-trained KTH models. We apologize for the inconvenience and thank you for your patience.

Is there any progress on this issue? Except for the error of return new global memory,is there any other issue that cause the mismatch?

xiaomingdujin commented 4 years ago

progress on this issue

We noticed that there is a bug in the current code about "global_memory" which may cause for the mismatched pretrained models on the KTH dataset. As this code repo was reproduced after the first author left Google, this issue did not exist in our original experiments and the results reported in the paper are good. We are working on fixing this issue and refreshing our pre-trained KTH models. We apologize for the inconvenience and thank you for your patience.

Is there any progress on this issue? Except for the error of return new global memory,is there any other issue that cause the mismatch?

It's been a month, there is no progress.

dekaiidea commented 4 years ago

Is there any progress on this issue? I have been waiting it for a long time .

index19950919 commented 4 years ago

We noticed that there is a bug in the current code about "global_memory" which may cause for the mismatched pretrained models on the KTH dataset. As this code repo was reproduced after the first author left Google, this issue did not exist in our original experiments and the results reported in the paper are good. We are working on fixing this issue and refreshing our pre-trained KTH models. We apologize for the inconvenience and thank you for your patience.

I also want to cite your paper, but the code cannot run due to a bug. Is it fixed now?

yifanzhang713 commented 4 years ago

Is there any progress on this issue?

SherryyHou commented 4 years ago

i want to know where to download the pretrain model !