The adv_loss curve is strange.

hfslyc / AdvSemiSeg

Adversarial Learning for Semi-supervised Semantic Segmentation, BMVC 2018

504 stars 129 forks source link

The adv_loss curve is strange. #39

Closed Pyten closed 4 years ago

Pyten commented 5 years ago

Hi! First, I'd like to thank you for the very helpful repo. During training with the same discriminator as yours, I have met some problems which I can't figure out. 1. The discriminator loss of pred and GT are almost unchanged for most of the time. So I am wondering if this phenomenon is normal. Btw, I haven't added semi data yet.

adv_loss

hfslyc commented 5 years ago

Hi,

It seems that your discriminator is pretty converged. Usually, we would expect than loss_D is around 0.2-0.5 when adversarial training is properly working. What kind of data are you using?

Pyten commented 5 years ago

Hi, thank you for your replay. My dataset is some synthetic document data. I found a probable reason for the curve that I add an extra sigmoid function before bcewithlogit which already integrated sigmoid in it. But when I remove the sigmoid function, the result gets much worse than before. with the lambda_adv = 0.01, batch size = 8 and other params same with you, I got the following figure. I am still not sure if this is normal? Please give me some advice.

hfslyc commented 5 years ago

The D_loss is way too low. There must be some unsymmertric statistics between your GT and pred so that D can easily differentiate it. The other possibility is that the adversarial loss is not trained properly.

btw, when D_loss is low, the adv_loss should be pretty high. This part is a bit weird.

Pyten commented 5 years ago

Thank you! My GT and pred are the same types as the usual segmentation task, the label is one-hot format got from the BxHxW tensor label with 4 classes. I'll check my code again. But what is strange is that although the loss is not normal in the first experiment as the first figure shown above, it finally got a good result than experiment without adversarial training. And the second one got a much worse result than experiments without adv.

hfslyc commented 5 years ago

I see. I don't have any new suggestions other than looking into why adv_loss and D_loss are both low while they should be competing against each other.

Pyten commented 5 years ago

Thanks for your early reply! Since the adv_loss shown in the 2nd figure is multiplied by 0.01, I am wondering if the original value, about 1, is high enough compared with D_pred_loss, about 0.01. I think I need to find out why the D_pred_loss can't decrease desirably. Another question I have met is that if trained with more than one GPU, the result will be much worse, and often can't converge. I noticed that you trained with one gpu in your paper. I'd like to know have you tried more than one gpu? Or do you have some clue on that? Thanks again!

hfslyc commented 4 years ago

Hi, sorry for not following up in time. I'm closing this issue for now. Feel free to shoot me an email if there is any more question regarding this work.