lovasz loss - Githubissues

margokhokhlova commented 5 years ago

Hello! Thank you very much for sharing your code. I would really appreciate if you can give me advice here... I am trying to use your code and s a slightly modified model (the encoder and the number of filters in unet changed) on a semantic seg problem. It all works fine until I try to change the model and loss for lovasz (I change the model for a corresponding one as well, but keep the weights pre-trained with a bce_dice). And at this moment the performance instead of going up just drops. Can it be only due to my data? I use SGD and lr 0.001.

ybabakhin commented 5 years ago

Hi! Do you use keras part of the model or pytorch one?

I think you're using the valid approach. I will just repeat the correct one here in case I haven't understood your approach. 1) Train a model using bce-dice and initial architecture. Save the weights 2) Create new model deleting sigmoid activation from the last layer. As it's done here 3) Load weights from 1) to the new model. 4) Change loss to lovasz and continue training

In such a setting everything should train correctly. The question is: how do you compare the performance in 1) and 4)?

The loss couldn't be used as long as they are different in each model.
If you're using your custom metric you should note that model 4) returns any real number. That's why it's better to center the predicitons at 0 instead of 0.5 (like it's done here)
Finally, if you are testing the quality after the training on a holdout set, you should use architecture from 1) again. It's done to pass model 4) output through the sigmoid in order to obtain approximate probabilities.

margokhokhlova commented 5 years ago

Thank you very much for your detailed answer! After reading it, I guess, my observation metric might be wrong, I use the binary accuracy as I did with bce_loss, so probably I need to add an activation there. I will try to modify as it is done with the lb one in your code.

For the rest, I do exactly that, and I use the code from Bes (Keras-based). The only possible difference is the fact that I use the original Unet from the segmentation_models package, without the modifications here (you introduce there cse_block & sse_block), but I guess it should not be the issue?

ybabakhin commented 5 years ago

Original Unet from the segmentation_models package should be ok. Just make sure that you do not use sigmoid activation for the lovasz model.

margokhokhlova commented 5 years ago

Thnak you very much for your help! Yes, it works, I am an idiot, I am closing the issue! And just by curiosity, did the introduction of the sse block improved the results significantly?

ybabakhin / kaggle_salt_bes_phalanx

lovasz loss #6