trypag / NonAdjLoss

28 stars 1 forks source link

Implementation of Lambda Control #3

Closed russchua closed 4 years ago

russchua commented 4 years ago

Good day dear Ganaye,

Thank you for your authoring your paper 'Removing Segmentation Inconsistencies with Semi-Supervised Non-Adjacency Constraint'. It has provided helpful inspiration for a research project I am currently working on. The version of your paper that I am referring to is available here: https://hal.archives-ouvertes.fr/hal-02275956/document

I have questions about the implementation of NonAdjLoss that you have specified in your paper as well as the implementation you have provided in this github repository. Hopefully you could help me clarify them.

  1. In your paper you write:

While training, if the validation-set Dice is steady or improving, λ is increased by λ_increase every n update epochs. Conversely, if the Dice falls more than ϵ below that of the initial unconstrained iteration, λ is rolled back to a lower value and the step size λ_increase is also reduced.

You update every mod n epochs.

Based on my understanding, you are also referring to DICE metric here, not DICE loss. I would like to clarify this.

If it is DICE metric, why did you intend to divide a metric by a loss? The reason I ask this is because it seems like the adjacency matrix in your torchmed repository changes the adjacency matrix into a forbidden adjacency matrix to calculate a loss.

  1. NonAdjLoss in this repo: You set self-tuning epoch to 100, and initialize λ as the ratio between Train_Dice and NonAdjLoss. You also set 'good_DICE' (maybe L0' instead of L0) as the Val_DICE, and compare Val_Dice every epoch rather than optimizing/comparing Train_DICE values. Is this correct?

Also, rather than changing λ every mod n (which is 5) epochs, you only change λ when you get 5 consecutive epochs of no sizeable increase in DICE. In the self.tuning_epoch (or first 100 epochs), the λ can only increase whereas after that, λ can only decrease.

Then you define 2 kinds of ϵ, ϵ1 = 0.01 (Line 91) and ϵ2 = 0.02 (Line 88) that determines how you increase λ. When DICE metric worsens (> ϵ2), you reset the counter. Yet when DICE metric worsens minorly (ϵ1<DICE<ϵ2), you do not increase λ. In fact, λ is only increased when the DICE metric improves or worsens marginally (< ϵ1) for the first 100 (self.tuning_epoch).

If I got it correctly, it seems your implementation is different from what you present in the paper. If so, is this method of tuning NonAdjLoss showing better results than the one you wrote in your paper?

My apologies for the lengthy post, but I'm very grateful for your code and look forward to hearing from you at your convenience.

Thank you very much!

trypag commented 4 years ago

Hello @calciver Based on this code

Based on my understanding, you are also referring to DICE metric here, not DICE loss. I would like to clarify this.

Correct this is the DICE metric, we only use it to monitor if the segmentation quality gets worse, I think the DICE loss should also be a decent estimator.

If it is DICE metric, why did you intend to divide a metric by a loss?

I don't get why you say this, can you point to an example ?

The reason I ask this is because it seems like the adjacency matrix in your torchmed repository changes the adjacency matrix into a forbidden adjacency matrix to calculate a loss.

correct we only penalize forbidden relationships !

You set self-tuning epoch to 100, and initialize λ as the ratio between Train_Dice and NonAdjLoss. You also set 'good_DICE' (maybe L0' instead of L0) as the Val_DICE, and compare Val_Dice every epoch rather than optimizing/comparing Train_DICE values. Is this correct?

correct, we want to apply the NonAdjLoss as much as possible, though we don't know by how much to tune it right from the beginning, so it's very empirical we start from so reasonable value and increase it as long as the original val DICE does not break.

Also, rather than changing λ every mod n (which is 5) epochs, you only change λ when you get 5 consecutive epochs of no sizeable increase in DICE. In the self.tuning_epoch (or first 100 epochs), the λ can only increase whereas after that, λ can only decrease.

correct, you don't want to increase lambda if there is some kind of instability. At some point after increasing lambda too much, the penalization is high enough and the DICE quality should be equivalent to the beginning of the training, if it's not the case self.tuning_epoch is the parameter indicating the epoch from which we should stop penalization and maybe decrease lambda a little bit to get some DICE points back.

If I got it correctly, it seems your implementation is different from what you present in the paper. If so, is this method of tuning NonAdjLoss showing better results than the one you wrote in your paper?

correct, my implementation is different from the paper, more complicated, more parameters. We didn't want the paper to be this much focused on this control algo since it was not the point. This implementation is the result of my experience to implement an automatic policy for the control of NonAdjLoss. The hardest point is not to increase lambda too much, because it can break the optimization. I am sure you can implement something simple in the same spirit :)