thunil / TecoGAN

This repo contains source code and materials for the TEmporally COherent GAN SIGGRAPH project.
Apache License 2.0
5.99k stars 1.14k forks source link

Training isn't starting with test case 3 or 4 #105

Closed AloshkaD closed 3 years ago

AloshkaD commented 3 years ago

Hi, I downloaded and prepared the dataset. When I chose options 3 or 4 for training the network, all it does is run one round of evaluation on the calendar dataset and quits. Any help with that is appreciated. Here's my output from runGan.py 4

Testing test case 4
Delete existing folder ex_FRVSR06-23-14/?(Y/N)
y
ex_FRVSR06-23-14_1/
Using TensorFlow backend.
Preparing train_data
[Config] Use random crop
[Config] Use random crop
[Config] Use random flip
Sequenced batches: 27610, sequence length: 10
Preparing validation_data
[Config] Use random crop
[Config] Use random crop
[Config] Use random flip
Sequenced batches: 2860, sequence length: 10
tData count = 27610, steps per epoch 27610
Finish building the network.
Scope generator:
Variable: generator/generator_unit/input_stage/conv/Conv/weights:0
Shape: [3, 3, 51, 64]
Variable: generator/generator_unit/input_stage/conv/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_1/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_1/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_1/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_1/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_2/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_2/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_2/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_2/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_3/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_3/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_3/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_3/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_4/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_4/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_4/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_4/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_5/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_5/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_5/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_5/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_6/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_6/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_6/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_6/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_7/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_7/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_7/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_7/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_8/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_8/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_8/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_8/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_9/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_9/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_9/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_9/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_10/conv_1/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_10/conv_1/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/resblock_10/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/resblock_10/conv_2/Conv/biases:0
Shape: [64]
Variable: generator/generator_unit/conv_tran2highres/conv_tran1/Conv2d_transpose/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/conv_tran2highres/conv_tran1/Conv2d_transpose/biases:0
Shape: [64]
Variable: generator/generator_unit/conv_tran2highres/conv_tran2/Conv2d_transpose/weights:0
Shape: [3, 3, 64, 64]
Variable: generator/generator_unit/conv_tran2highres/conv_tran2/Conv2d_transpose/biases:0
Shape: [64]
Variable: generator/generator_unit/output_stage/conv/Conv/weights:0
Shape: [3, 3, 64, 3]
Variable: generator/generator_unit/output_stage/conv/Conv/biases:0
Shape: [3]
total size: 843587
Scope fnet:
Variable: fnet/autoencode_unit/encoder_1/conv_1/Conv/weights:0
Shape: [3, 3, 6, 32]
Variable: fnet/autoencode_unit/encoder_1/conv_1/Conv/biases:0
Shape: [32]
Variable: fnet/autoencode_unit/encoder_1/conv_2/Conv/weights:0
Shape: [3, 3, 32, 32]
Variable: fnet/autoencode_unit/encoder_1/conv_2/Conv/biases:0
Shape: [32]
Variable: fnet/autoencode_unit/encoder_2/conv_1/Conv/weights:0
Shape: [3, 3, 32, 64]
Variable: fnet/autoencode_unit/encoder_2/conv_1/Conv/biases:0
Shape: [64]
Variable: fnet/autoencode_unit/encoder_2/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: fnet/autoencode_unit/encoder_2/conv_2/Conv/biases:0
Shape: [64]
Variable: fnet/autoencode_unit/encoder_3/conv_1/Conv/weights:0
Shape: [3, 3, 64, 128]
Variable: fnet/autoencode_unit/encoder_3/conv_1/Conv/biases:0
Shape: [128]
Variable: fnet/autoencode_unit/encoder_3/conv_2/Conv/weights:0
Shape: [3, 3, 128, 128]
Variable: fnet/autoencode_unit/encoder_3/conv_2/Conv/biases:0
Shape: [128]
Variable: fnet/autoencode_unit/decoder_1/conv_1/Conv/weights:0
Shape: [3, 3, 128, 256]
Variable: fnet/autoencode_unit/decoder_1/conv_1/Conv/biases:0
Shape: [256]
Variable: fnet/autoencode_unit/decoder_1/conv_2/Conv/weights:0
Shape: [3, 3, 256, 256]
Variable: fnet/autoencode_unit/decoder_1/conv_2/Conv/biases:0
Shape: [256]
Variable: fnet/autoencode_unit/decoder_2/conv_1/Conv/weights:0
Shape: [3, 3, 256, 128]
Variable: fnet/autoencode_unit/decoder_2/conv_1/Conv/biases:0
Shape: [128]
Variable: fnet/autoencode_unit/decoder_2/conv_2/Conv/weights:0
Shape: [3, 3, 128, 128]
Variable: fnet/autoencode_unit/decoder_2/conv_2/Conv/biases:0
Shape: [128]
Variable: fnet/autoencode_unit/decoder_3/conv_1/Conv/weights:0
Shape: [3, 3, 128, 64]
Variable: fnet/autoencode_unit/decoder_3/conv_1/Conv/biases:0
Shape: [64]
Variable: fnet/autoencode_unit/decoder_3/conv_2/Conv/weights:0
Shape: [3, 3, 64, 64]
Variable: fnet/autoencode_unit/decoder_3/conv_2/Conv/biases:0
Shape: [64]
Variable: fnet/autoencode_unit/output_stage/conv1/Conv/weights:0
Shape: [3, 3, 64, 32]
Variable: fnet/autoencode_unit/output_stage/conv1/Conv/biases:0
Shape: [32]
Variable: fnet/autoencode_unit/output_stage/conv2/Conv/weights:0
Shape: [3, 3, 32, 2]
Variable: fnet/autoencode_unit/output_stage/conv2/Conv/biases:0
Shape: [2]
total size: 1745506
The first run takes longer time for training data loading...
Save initial checkpoint, before any training
[testWhileTrain] step 0:
python3 main.py --output_dir ex_FRVSR06-23-14_1/train/ --summary_dir ex_FRVSR06-23-14_1/train/ --mode inference --num_resblock 10 --checkpoint ex_FRVSR06-23-14_1/model-0 --cudaID 0 --input_dir_LR ./LR/calendar/ --output_pre  --output_name 000000000 --input_dir_len 10
Using TensorFlow backend.
input shape: [1, 144, 180, 3]
output shape: [1, 576, 720, 3]
Finish building the network
Loading weights from ckpt model
Frame evaluation starts!!
Warming up 5
Warming up 4
Warming up 3
Warming up 2
Warming up 1
saving image 000000000_0001
saving image 000000000_0002
saving image 000000000_0003
saving image 000000000_0004
saving image 000000000_0005
saving image 000000000_0006
saving image 000000000_0007
saving image 000000000_0008
saving image 000000000_0009
saving image 000000000_0010
total time 1.9974193572998047, frame number 15

and it quits after that without any errors

tom-doerr commented 3 years ago

I don't know why it doesn't work for you, but you could try to use my docker environment: https://github.com/tom-doerr/TecoGAN-Docker

On Wed, Jun 23, 2021, 20:28 Ali (Alex) Darwish, Ph.D. < @.***> wrote:

Hi, I downloaded and prepared the dataset. When I chose options 3 or for for training the network, all it does it runs one round of evaluation on the calendar dataset and quits. Any help with that is appreciated. Here's my output from runGan.py 4

Testing test case 4 Delete existing folder ex_FRVSR06-23-14/?(Y/N) y ex_FRVSR06-23-14_1/ Using TensorFlow backend. Preparing train_data [Config] Use random crop [Config] Use random crop [Config] Use random flip Sequenced batches: 27610, sequence length: 10 Preparing validation_data [Config] Use random crop [Config] Use random crop [Config] Use random flip Sequenced batches: 2860, sequence length: 10 tData count = 27610, steps per epoch 27610 Finish building the network. Scope generator: Variable: generator/generator_unit/input_stage/conv/Conv/weights:0 Shape: [3, 3, 51, 64] Variable: generator/generator_unit/input_stage/conv/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_1/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_1/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_1/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_1/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_2/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_2/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_2/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_2/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_3/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_3/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_3/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_3/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_4/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_4/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_4/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_4/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_5/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_5/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_5/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_5/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_6/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_6/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_6/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_6/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_7/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_7/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_7/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_7/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_8/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_8/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_8/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_8/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_9/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_9/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_9/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_9/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_10/conv_1/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_10/conv_1/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/resblock_10/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/resblock_10/conv_2/Conv/biases:0 Shape: [64] Variable: generator/generator_unit/conv_tran2highres/conv_tran1/Conv2d_transpose/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/conv_tran2highres/conv_tran1/Conv2d_transpose/biases:0 Shape: [64] Variable: generator/generator_unit/conv_tran2highres/conv_tran2/Conv2d_transpose/weights:0 Shape: [3, 3, 64, 64] Variable: generator/generator_unit/conv_tran2highres/conv_tran2/Conv2d_transpose/biases:0 Shape: [64] Variable: generator/generator_unit/output_stage/conv/Conv/weights:0 Shape: [3, 3, 64, 3] Variable: generator/generator_unit/output_stage/conv/Conv/biases:0 Shape: [3] total size: 843587 Scope fnet: Variable: fnet/autoencode_unit/encoder_1/conv_1/Conv/weights:0 Shape: [3, 3, 6, 32] Variable: fnet/autoencode_unit/encoder_1/conv_1/Conv/biases:0 Shape: [32] Variable: fnet/autoencode_unit/encoder_1/conv_2/Conv/weights:0 Shape: [3, 3, 32, 32] Variable: fnet/autoencode_unit/encoder_1/conv_2/Conv/biases:0 Shape: [32] Variable: fnet/autoencode_unit/encoder_2/conv_1/Conv/weights:0 Shape: [3, 3, 32, 64] Variable: fnet/autoencode_unit/encoder_2/conv_1/Conv/biases:0 Shape: [64] Variable: fnet/autoencode_unit/encoder_2/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: fnet/autoencode_unit/encoder_2/conv_2/Conv/biases:0 Shape: [64] Variable: fnet/autoencode_unit/encoder_3/conv_1/Conv/weights:0 Shape: [3, 3, 64, 128] Variable: fnet/autoencode_unit/encoder_3/conv_1/Conv/biases:0 Shape: [128] Variable: fnet/autoencode_unit/encoder_3/conv_2/Conv/weights:0 Shape: [3, 3, 128, 128] Variable: fnet/autoencode_unit/encoder_3/conv_2/Conv/biases:0 Shape: [128] Variable: fnet/autoencode_unit/decoder_1/conv_1/Conv/weights:0 Shape: [3, 3, 128, 256] Variable: fnet/autoencode_unit/decoder_1/conv_1/Conv/biases:0 Shape: [256] Variable: fnet/autoencode_unit/decoder_1/conv_2/Conv/weights:0 Shape: [3, 3, 256, 256] Variable: fnet/autoencode_unit/decoder_1/conv_2/Conv/biases:0 Shape: [256] Variable: fnet/autoencode_unit/decoder_2/conv_1/Conv/weights:0 Shape: [3, 3, 256, 128] Variable: fnet/autoencode_unit/decoder_2/conv_1/Conv/biases:0 Shape: [128] Variable: fnet/autoencode_unit/decoder_2/conv_2/Conv/weights:0 Shape: [3, 3, 128, 128] Variable: fnet/autoencode_unit/decoder_2/conv_2/Conv/biases:0 Shape: [128] Variable: fnet/autoencode_unit/decoder_3/conv_1/Conv/weights:0 Shape: [3, 3, 128, 64] Variable: fnet/autoencode_unit/decoder_3/conv_1/Conv/biases:0 Shape: [64] Variable: fnet/autoencode_unit/decoder_3/conv_2/Conv/weights:0 Shape: [3, 3, 64, 64] Variable: fnet/autoencode_unit/decoder_3/conv_2/Conv/biases:0 Shape: [64] Variable: fnet/autoencode_unit/output_stage/conv1/Conv/weights:0 Shape: [3, 3, 64, 32] Variable: fnet/autoencode_unit/output_stage/conv1/Conv/biases:0 Shape: [32] Variable: fnet/autoencode_unit/output_stage/conv2/Conv/weights:0 Shape: [3, 3, 32, 2] Variable: fnet/autoencode_unit/output_stage/conv2/Conv/biases:0 Shape: [2] total size: 1745506 The first run takes longer time for training data loading... Save initial checkpoint, before any training [testWhileTrain] step 0: python3 main.py --output_dir ex_FRVSR06-23-14_1/train/ --summary_dir ex_FRVSR06-23-14_1/train/ --mode inference --num_resblock 10 --checkpoint ex_FRVSR06-23-14_1/model-0 --cudaID 0 --input_dir_LR ./LR/calendar/ --output_pre --output_name 000000000 --input_dir_len 10 Using TensorFlow backend. input shape: [1, 144, 180, 3] output shape: [1, 576, 720, 3] Finish building the network Loading weights from ckpt model Frame evaluation starts!! Warming up 5 Warming up 4 Warming up 3 Warming up 2 Warming up 1 saving image 000000000_0001 saving image 000000000_0002 saving image 000000000_0003 saving image 000000000_0004 saving image 000000000_0005 saving image 000000000_0006 saving image 000000000_0007 saving image 000000000_0008 saving image 000000000_0009 saving image 000000000_0010 total time 1.9974193572998047, frame number 15

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thunil/TecoGAN/issues/105, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSYSFDRGXBUCI742I3AV63TUIRUNANCNFSM47GK64EA .

AloshkaD commented 3 years ago

Thanks @tom-doerr, I compared the main.py and runGan.py from your repo. and they are identical. But later it occurred to me that it could be because of the mixed GPU environment that I have. I have 2080tis and 1080tis on the same machine and for some reason it wasn't running on the 2080ti GPU. I specified the 1080ti ID and it worked. Thanks!

AloshkaD commented 3 years ago

btw @tom-doerr, did you run the training on multi-gpu using your docker image?

tom-doerr commented 3 years ago

No, I didn't.

On Thu, Jun 24, 2021, 00:23 Ali (Alex) Darwish, Ph.D. < @.***> wrote:

btw @tom-doerr https://github.com/tom-doerr, did you run the training on multi-gpu using your docker image?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thunil/TecoGAN/issues/105#issuecomment-867199389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSYSFD4L5VKPPWMDGWKTRTTUJNGPANCNFSM47GK64EA .