Having trouble training

hellochick / PSPNet-tensorflow

TensorFlow-based implementation of "Pyramid Scene Parsing Network".

326 stars 123 forks source link

Having trouble training #65

Open AmeetR opened 5 years ago

AmeetR commented 5 years ago

Hi,

I'm trying to train on cityscapes in order to first replicate the 70% miou and then move to other driving datasets to see what happens. However, I'm having trouble replicating this. I can't seem to get the loss to converge below 1.7. I'm training from scratch on purpose in order to get a clean baseline for the other datasets.

hellochick commented 5 years ago

Hey AmeetR

If you run through the history issues, you will found that we have discussed this problem several times. Since this repository just convert the pre-trained weight from caffe original code to tensorflow version. And the training code is just giving a try. If you want to replicate the performance, you need to implement the Synchronize BN Layer first in order to do large batch size training (as described in the paper).

AmeetR commented 5 years ago

Hi, @hellochick thanks for responding. I'll try to implement that layer tomorrow, but it looks like every time I try to increase the batch more than two my gpu runs out of memory. Also, I did look through all of the history issues and couldn't find anything much of use, which is why I made a new issue. That said, I'm now getting a loss of ~.25, but the evaluation is still .03. Any idea why this may be?