sacmehta / ESPNetv2

A light-weight, power efficient, and general purpose convolutional neural network
MIT License
447 stars 70 forks source link

Using sum of 2 losses #15

Closed jongmokim7 closed 5 years ago

jongmokim7 commented 5 years ago

Hi, thank you for your works and share.

I found there are two outputs from EESPNet_Seg output1 from level 4 (used in inference) output2 from level 2 (only used in training stage)

I'm wondering why did you use sum of 2 losses loss1 = criterion(output1, target) loss2 = criterion(output2, target) loss = loss1 + loss2

1) What if trying "loss = criterion(output1+output2, target)" and using "output1+output2" as a final segmentation output, which is similar to "skip layer" used in FCN-8s.

2) What if using one more (another) output from level 3.

If you have already tried those combination, can you inform the details you tried and explain the reason why you didn't use that? If not, what do you think about that concept? what can we expect?

sacmehta commented 5 years ago

Sorry for the late response. I somehow missed this one.

1) We followed PSPNet kind of approach. However, you can use the one you are proposing. I don’t think it will make a huge difference.

2) give it a try.

lqxisok commented 5 years ago

@jongmokim7 This question is very interesting! But to some extend, there is no a proper explanation for that. In PSPNet, a weighted auxiliary loss which is set to be 0.4 is added to the training procedure and in BiSeNet two more auxiliary losses(their weight are both set to 1) are added to optimize the whole model. So you may see many excellent models use this training strategy while how do we understand it.

It should have some benefits at least:

  1. Increase gradient. Gradient,in general,gradually guides our model to step into optimal.But in this case, if we add one more loss for training the model will get a gradient almost twice as large as the original one which seems to be more helpful for model to learn. After a long time for learning, performance of model will be stable and you will surprisingly find that loss here is also almost twice as large as the original one. Haha, that's may be or not be true but what you need to think is a better model will get better performance which comes mainly from the representation capacity not from more and more losses I think.
  2. More context information or gather more information?? I'm not very clear about that.But I suggest that we need compare more cases about this issue. And here I list some experiment to help understand.
    1. Firstly, train the model with only loss( we can view this as baseline)
    2. Train the model with two or more losses jointly( Jointly train the model )
    3. Pre-train model with part model and then add it to the whole model to train( to verify whether the auxiliary loss help performance )
    4. ···( you can image any comparable experiment here ) So what I want to say is when you add one more loss to your model ,you may face a Collaborative Optimization question.

I'm also a learner of DL and focus on semantic segmentation. Here, I agree with @sacmehta you need give a try about your questions.