Train problem for large loss value

laughtervv / SGPN

SGPN:Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation, CVPR, 2018

MIT License

268 stars 62 forks source link

Train problem for large loss value #5

Closed CrazySnailer closed 6 years ago

CrazySnailer commented 6 years ago

Hi, @laughtervv ，I download the dataset from https://drive.google.com/file/d/1UjcXB2wMlLt5qwYPk5iSAnlhttl1AO9u/view?ts=5aecefd1

But I get the training loss is very large. this is my training log.

the loss value is about 20927.877148, any problem which I met?

laughtervv commented 6 years ago

In my implementation, I did semantic segmentation pretraining with large batchsize. And the BN params are fixed during the finetuning. You can also try to set margins to smaller values (say 2.0/1.0).

shuluoshu commented 6 years ago

@laughtervv , I also confused about this,. In your paper, you said

The network is trained with only the L_sim loss for the first 5 epochs.

However, the SIM denotes the similarity matrix learning, here you mentioned that first pretrain semantic segmentation task, so I wonder which one should comes first.

Besides, I wonder how to pretrain semantic segmentation labeling in this task, does it mean to set other losses to zero? Thanks so much!

laughtervv commented 6 years ago

Hi, I trained with only L_sim for 5 epochs after the semantic segmentation pretraining. For semantic segmentation, we trained as in PointNet. You can just set other losses to zero with our code.

shuluoshu commented 6 years ago

Thanks for your immediate reply, I will try it. Thanks again! @laughtervv

shuluoshu commented 6 years ago

@laughtervv Hi, it's me again. I did the semantic segmentation pretraining first by setting large batch size and is_training= True and loss = sem_loss only. It converges and then I finetune to train the L_sim only by setting small batchsize and is_training= False as well as loss = sim_loss only. However, the loss is always about 2000 or more, it didn't converge. I wonder if something was wrong. Can you help me ? Thanks!

laughtervv commented 6 years ago

@shuluoshu Can you try smaller margin values?

CrazySnailer commented 6 years ago

@laughtervv , I use your pretrained model for finetune under the dataset area 1-6 except area 5, But after about 34 epoch the loss value always about 746 or more, and the falling trend of the loss value is no longer change. I wonder if something was wrong?

CrazySnailer commented 6 years ago

@laughtervv Hi, did the L_sim loss value converge after trained with only L_sim for 5 epochs after the semantic segmentation pretraining? My L_sim was about 1700 or more after 5 epochs with the margin 80 and 10. The flowlling was my loss trend.

WXinlong commented 6 years ago

@shuluoshu Hi shuluo, I also got loss about 2000 when finetune using only the sim_loss. May I ask how do u fix it? Should I decrease the margin_same and margin_diff, and to what kind of values? Thanks for your help in advance!

LZDSJTU commented 6 years ago

@CrazySnailer @shuluoshu @laughtervv @WXinlong I met the same problem that the simmat_loss was high which was about 2000. How do you solve this problem? Thank you very much.

laughtervv commented 6 years ago

@LZDSJTU What is your grouperr (same pos diff) like?

LZDSJTU commented 6 years ago

@LZDSJTU What is your grouperr (same pos diff) like?

After training 5 epoched(only simmat_loss ), the log is as following: Batch: 4099, loss: 1858.524854, grouperr: 0.114560, same: 0.035040, diff: 0.005788, pos: 0.330818, ptsseg_loss: 6.585737, confi_loss: 0.158072 Batch: 4109, loss: 1351.917566, grouperr: 0.100161, same: 0.025478, diff: 0.003694, pos: 0.275799, ptsseg_loss: 5.242930, confi_loss: 0.167758 Batch: 4119, loss: 1832.236417, grouperr: 0.116818, same: 0.059874, diff: 0.003689, pos: 0.331090, ptsseg_loss: 8.379278, confi_loss: 0.170734 Batch: 4129, loss: 1832.259976, grouperr: 0.138293, same: 0.027860, diff: 0.002976, pos: 0.411333, ptsseg_loss: 7.827518, confi_loss: 0.145797 Batch: 4139, loss: 1527.208466, grouperr: 0.121687, same: 0.021949, diff: 0.003138, pos: 0.324450, ptsseg_loss: 7.452888, confi_loss: 0.162805

lhiceu commented 5 years ago

Hi @shuluoshu, I'm going to start training, but the loss is always about 3000 ~ 2000. I see your comments about pretraining. I don't know how to do it. Where can I set the loss loss = sim_lossorloss = sem_loss? Should the pretraining start from the trained models peovided by the author?

lhiceu commented 5 years ago

Hi @laughtervv
I saw your comments, you did semantic segmentation pretraining with large batchsize? So, I should set batchsize=?