Training loss explosion and question about network training

Hi, thank you for your great job and it really helps me a lot. When I ran the training code, I found that the content_loss is very big in the beginning, which is more than 10 million.

I checked the code and found the reason: As mentioned in #75 , activation function is not used at the end of generator_F, which makes the output exceeds 1. Then the preprocess() function multiplies the output with 2 and amplifies the output. As the output is calculated iteratively, and if RNN_N=10, the output will be amplified by 19 times. This will result in a very big loss at the beginning.

So if an activaton function like tanh or sigmoid should be added at the end of the generator_F? I experimented with this and it performs well.

By the way, the residual_block of the generator_F doesn't use an activation function as we usually do, is there any specially consideration for this?

thunil / TecoGAN

Training loss explosion and question about network training #93