Open JiamingSuen opened 7 years ago
Hi @JiamingSuen ,
thanks for checking out our training code!
At the moment we use the hyperparameters as in the training code. There is probably a lot of room for improving these parameters. The losses will eventually converge if you train for a very long time, but it does not improve the testing performance.
v2 is an attempt to create a version of our network that can be trained easily with tensorflow.
It is meant as a basis for future experiments to improve the architecture.
First steps towards a better architecture are already in blocks.py
.
We share it because we hope it will be useful to other researchers.
As you probably have noticed, the training procedure is quite complex and the training losses can be difficult to understand on first glance. One important remaining task is to provide easy to use evaluation code to better assess the network performance.
Thanks for this amazing work!
Thank you!
Thanks for the reply. I tried to initialize weights with tf.contrib.layers.variance_scaling_initializer(factor=2.0)
, which is the "MSRA-initialization" described in this paper, while it's not helping a lot.
Will keep update my progress here.
Asking myself the same thing ... thought Totalloss should go down after a while. But it does not really look good (+160k iterations) : https://tensorboard.dev/experiment/aay2ZG8aRUaZM1EwML3jPA/#scalars&run=0_flow1%2Ftrainlogs&_smoothingWeight=0.989
Edit: Guess i would need a total loss that does not include the _sig (and instead include the _sig_unscaled losses) to have a nice looking graph. Atleast i now understand why total loss does not decrease much while training itself actually does improve.
Hi @benjaminum, have you ever successfully trained the network with your latest v2 training code? I've been experiment with it for a few days, finding it's extremely hard to converge. This is my training loss status: (please ignore TotalLoss_1) This is trained within about 500k iterations with default start learning rate and batch size on the first evolution. I also tried other lr and optimizer configurations, the result is similar.
blocks.py
, could that be a reason? Why would you make this change and what is v2 means exactly..Thanks for this amazing work!