ConfidNet Failure Cases & Generalization

ssakhavi commented 3 years ago

Hi,

I've been admiring the paper a lot. Especially its ability to estimate confidence post-training.

I have a few questions:

What were some of your failure cases? Was there a situation where the method didn't work?
What if the network is overfitting, meaning the value of the confidence is always saturated to 1? How can you make sure that the network has a reasonable confidence level?
Is it safe to stay if you know the architecture of any model, it is possible to train the ConfidNet as a branch for any model? what are the limitations?

chcorbi commented 3 years ago

Hi @ssakhavi

Thank you for your interest in the paper !

As noted in the paper, many modern neural networks are subject to overfitting in image experiments. Not a failure case but while ConfidNet improves over MCP, we are far from reaching the TCP target when we look at the performances (AUPR, AUC). We empirically observe that the approach works well on small networks/datasets such as in MNIST and SVHN experiments. In a recent paper extending ConfidNet and under review at TPAMI (https://arxiv.org/abs/2012.06508), we add to adapt the architecture of ConfidNet to suit a DeepLab segmentation model. I think the point in this work is more about using an auxiliary network to learn some statistics of the main classifier (TCP) which enables to better distinguish failures from correct predictions.
In section 3.3 of the paper, we indeed tackle this issue. The number of failures available in training is clearly key to better train ConfidNet. Our experiments show that even with a small number of failures in a training set, ConfidNet is still able to outperform other uncertainty baselines. Nevertheless, to further improve, I think providing or generating more failures to mitigate the unbalanced dataset issue could be a solution.
Yes, ConfidNet is model- and task agnostic. We conducted in the paper experiments in classification and semantic segmentation. As mentioned in previous points, depending on the network used (VGG, ResNet, DeepLab), you may need to adapt the architecture of ConfidNet to better suit the main model.

ssakhavi commented 3 years ago

Thanks for the reply, Charles.

Yes. One of the main questions which I had (and I think I got the answer based on your explanation in the paper) is the matter of TCP being always fully confident for training and a limited number of failure samples are available for training. Maybe an augmentation approach would be another way of coming up with a solution to force the network to not be confident? (https://paperswithcode.com/paper/misclassification-detection-via-class)

I also contacted the authors of the confidence-aware training paper which uses a ranking system for estimating confidence and I was corresponding with them regarding how a post-training version of their algorithm can help improve on ConfidNet ( https://paperswithcode.com/paper/confidence-aware-learning-for-deep-neural).

But in general, I think until confidence-aware model design or training becomes a trend, auxiliary networks for post-training confidence detection is the way.

Thanks

On Wed, Dec 16, 2020 at 10:28 PM Charles Corbière notifications@github.com wrote:

Hi @ssakhavi https://github.com/ssakhavi

Thank you for your interest in the paper !

1.

As noted in the paper, many modern neural networks are subject to overfitting in image experiments. Not a failure case but while ConfidNet improves over MCP, we are far from reaching the TCP target when we look at the performances (AUPR, AUC). We empirically observe that the approach works well on small networks/datasets such as in MNIST and SVHN experiments. In a recent paper extending ConfidNet and under review at TPAMI (https://arxiv.org/abs/2012.06508), we add to adapt the architecture of ConfidNet to suit a DeepLab segmentation model. I think the point in this work is more about using an auxiliary network to learn some statistics of the main classifier (TCP) which enables to better distinguish failures from correct predictions. 2.

In section 3.3 of the paper, we indeed tackle this issue. The number of failures available in training is clearly key to better train ConfidNet. Our experiments show that even with a small number of failures in a training set, ConfidNet is still able to outperform other uncertainty baselines. Nevertheless, to further improve, I think providing or generating more failures to mitigate the unbalanced dataset issue could be a solution. 3.

Yes, ConfidNet is model- and task agnostic. We conducted in the paper experiments in classification and semantic segmentation. As mentioned in previous points, depending on the network used (VGG, ResNet, DeepLab), you may need to adapt the architecture of ConfidNet to better suit the main model.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/valeoai/ConfidNet/issues/7#issuecomment-746377228, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOAG63F36FPMRBTRJJRJODSVC7ZNANCNFSM4U2DNC2Q .

--

Siavash Sakhavi http://www.linkedin.com/in/siavashsakhavi

chcorbi commented 3 years ago

Thank you for the pointer, I'll look into that.

Indeed, using data augmentation such as mix-up to generate more failures was something I vaguely consider. Glad to see it implemented and that it works!

Hope you well for your project :)

valeoai / ConfidNet

ConfidNet Failure Cases & Generalization #7