Open jg10545 opened 4 years ago
Both solutions seem to work reasonably well- going with the second for now (with the lower bound as a hyperparameter) since it keeps the classification loss on the same scale (so changing the hyperparameter doesn't mean you also have to change class_loss_weight
).
When I run on large images with sparse objects, I find that there's a lot of unnecessary masking going on.
My hypothesis is that some of the issue is coming from the form of the classification loss: by using the negative log-prob from the classifier out, an update that changes a classification from 1.0 to 0.1 has the same "reward" as an update that changes a classification from 0.01 to 0.001. This means that the mask generator can "improve" under this loss function by masking out parts of images that the classifier was already pretty sure didn't contain an object.
Possible solutions:
-1*log(p + epsilon)
with1-p
(e.g. use raw probabilities for the loss)epsilon
to something like 0.01 (basically putting a hard bound on low the classification can go before we stop rewarding it)