TensorBoard with good prediction, however, evaluation result is bad

dannydrinkswater commented 6 years ago

question

The prediction on Tensorboard is pretty descent, however, when I try to evaluate the model, the predictions become very noisy and with a very bad accuracy. Would you please have a look

hiwonjoon commented 6 years ago

Hi, thanks for testing my implementation.

While wrapping up my code, I missed the part where running statistics (mean&stds) for batchnorm is updated. Since batchnorm layer needs accumulated mean & std values on evaluation step while it uses on-the-fly statistics, it explains why you get a drastically worse performance on your evaluation time.

I corrected this bug, so please try again. If you keep observing a weird situation, please let me know. Thanks!

P.S. If you do not want to retrain your whole model again, then, you can slightly modify the evaluation code. You can change the code on main.py#L179,

        net = FRRN(None,None,kwargs['K'],ims,lbs,partial(_arch_type_a,NUM_CLASSES),params,False)

to

        net = FRRN(__some tf variable__,__some tf variable __ ,kwargs['K'],ims,lbs,partial(_arch_type_a,NUM_CLASSES),params,True)

(Notice that, the last parameter changed to True. It will let the network to use on-the-fly statistics instead of not-updated means and stds).

Then, use the same batch size for an evaluation.

dannydrinkswater commented 6 years ago

Thanks for your immediate reply. I've tried to modified the evaluation code, which didn't give much improvement. I am retraining the model with 40000 iterations these time, and would update the result once it's finished

dannydrinkswater commented 6 years ago

I've found out the possible cause that leads me to the scenario. If I modified the code and instantiate the BatchNorm() inside the embedded function "def _frru", I would get a bad result. On the other hand, your original implementation is to instantiate the BatchNorm() and pass it to "def _frru" as arguments, which give an correct prediction.

hiwonjoon / tf-frrn

TensorBoard with good prediction, however, evaluation result is bad #2