seoungwugoh / STM

Video Object Segmentation using Space-Time Memory Networks
405 stars 81 forks source link

Reimplementation training not stable #33

Open xiankgx opened 3 years ago

xiankgx commented 3 years ago

Dear @seoungwugoh , I've read your paper and found your work extremely interesting. I've been trying to reproduce the work according to your paper, with some minor changes, like decoder layers and such. The memory read operation which is very much like transformer's attention mechanism is taken from this repo. Others, all reimplemented according to your paper's description.

I've been trying to train the model, loss goes down initially, and after a while it suddenly shoots up. I've tried:

I've not tried disabling the batch norm as your paper suggests; and I'm using mixed precision training with Apex AMP.

Have you experienced such training instability before? What do you think could be the problem?