valeoai / ConfidNet

Addressing Failure Prediction by Learning Model Confidence
Other
166 stars 36 forks source link

Question about the accuracy of training vgg16 on CIFAR10 dataset? #3

Closed GWwangshuo closed 4 years ago

GWwangshuo commented 4 years ago

Hi, thanks for your really interesting work. I try to run your code for training vgg16 on cifar10 dataset. However, the final accuracy on test set is not as good as yours. My best performance for classification on cifar10 test set can only achieve 89.32% with the same configuration as yours. Could you give me some help for improving this accuracy?

Moreover, I tried to modify your implementation of vgg16 and tuned the lr and weight decay to achieve the same accuracy on test set; However, by using this pretrained model to finetune confidnet, I still cannot get the similar FPR-95%-TPR AUPR-Error AUPR-Success AUC as yours. Why?

chcorbi commented 4 years ago

Hi, thank you for the feedback. In the paper, classification model is selected using validation set accuracy. If you tested with the model of the last epoch, that may explain the difference.

Regarding the metrics, they are sensible to the model used and the error/success partition it creates. A model with a lower reported test accuracy leads to more errors in the test set. As such, if a error sample isn't well ranked, its impact on the metric will be reduce if there are more errors. That's why in the paper, I make sure to compare the various confidence measures (MCP, TrustScore, MCDropout, ConfidNet) using the same classification model.

If needed, more details about the implementation and hyper-parameters are provided in the supplemental: https://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence

GWwangshuo commented 4 years ago

@chcorbi Thanks for you reply. I have tried to reimplement your method by myself. However, after I train a good classifier on cifar10 test set, I cannot obtain a well confidnet by freezing the feature extractor and only finetuning the last fully connection layer in confidnet. In my experiment, the confidnet tends to converge fast (only a few epoch) and finally gives the predictions around 0.9.

During test, the true class probility is also around 0.9 for all samples which are incorrect? Could you give me some hints to explain this phenonmenon? To be specific, I draw the below figure:

1fc52dc1e56f81c678ed443dbcc4904

the left figure is the distribution of the baseline trained without confidnet while the right figure represents the figure trained with confidnet. It turns out that the confidnet is easy to be overfittting. Please give me some suggestions about how to finetune the confidnet, I really appreciate it. Thanks.

chcorbi commented 4 years ago

Did you re-implement from scratch? If so, be careful in Pytorch that your feature extractor layers are indeed set to require_grad=False during training, as does the function freeze_layers() in the SelfConfidLearner class. Plus, also deactivate dropout layers to avoid unwanted stochastic effects.

only finetuning the last fully connection layer in confidnet

In this implementation, ConfidNet is made of 5 fc layers added upon the penultimate layer of the original model. If you using only 1 fc layer for ConfidNet that may explain the drop in confidence estimation.

During test, the true class probility is also around 0.9 for all samples which are incorrect?

Using True Class Probability (TCP) as confidence measure, your missclassified samples should rather have low values such as in the figure of the paper: fig1 If you don't have this kind of figure for TCP, you may have a problem in your code. This will certainly affect ConfidNet training as TCP is the target value during confidence training. Regarding ConfidNet figure, it won't be as good as TCP for sure but it should be something in-between TCP figure and MCP figure.

GWwangshuo commented 4 years ago

@chcorbi Thanks for your reply.

1. Actually, I re-implement it by refering your code. I add another 5 fc layers, freeze all layers and deactive dropout layer in the feature extractor. However, I cannot train a good ConfidNet which should generate the similar distribution figure as yours shown in the above.

2. I have attempted your code for many times in one week as following:

3. Morever, I try to use your pretrained model on cifar10 dataset from here to draw the following distribution figures. It seems your pretrained model has the same problem. 98f5df8df4e439d3e6d6090c698a102

The left figure is Maximum class probability and the right figure corresponds to True class probability. Could you please verify it or share the code for drawing histogram as yours? Thanks a lot.

Did your draw the previous TCP figure by using the ground truth? Did ConfidNet really learn how to predict TCP on the test set? Thanks.

chcorbi commented 4 years ago

The distribution plot presented in the paper corresponds to a comparison between MCP and the criterion TCP. ConfidNet is trained to match that TCP criterion on the training dataset. According to the results obtained, when drawing the plot associated to ConfidNet, you will find something between MCP plot and TCP plot, actually closer to MCP indeed.

Your plot seems accurate, comparing here MCP and ConfidNet. The error distribution have been slightly shifted to lower values while keeping success prediction to high values. If you measure the AP_errors, you can find that ConfidNet improves over MCP.

To help visualize, I added a notebook to plot success/errors histogram in commit https://github.com/valeoai/ConfidNet/commit/e94bd89c54df4e135626bdb485cb4467bc4048e9

GWwangshuo commented 4 years ago

@chcorbi Thanks for your help. Close this issue since I am clear now.