Closed GWwangshuo closed 4 years ago
Hi, thank you for the feedback. In the paper, classification model is selected using validation set accuracy. If you tested with the model of the last epoch, that may explain the difference.
Regarding the metrics, they are sensible to the model used and the error/success partition it creates. A model with a lower reported test accuracy leads to more errors in the test set. As such, if a error sample isn't well ranked, its impact on the metric will be reduce if there are more errors. That's why in the paper, I make sure to compare the various confidence measures (MCP, TrustScore, MCDropout, ConfidNet) using the same classification model.
If needed, more details about the implementation and hyper-parameters are provided in the supplemental: https://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence
@chcorbi Thanks for you reply. I have tried to reimplement your method by myself. However, after I train a good classifier on cifar10
test set, I cannot obtain a well confidnet
by freezing the feature extractor and only finetuning the last fully connection layer in confidnet
. In my experiment, the confidnet
tends to converge fast (only a few epoch) and finally gives the predictions around 0.9
.
During test, the true class probility is also around 0.9
for all samples which are incorrect? Could you give me some hints to explain this phenonmenon? To be specific, I draw the below figure:
the left figure is the distribution of the baseline trained without confidnet
while the right figure represents the figure trained with confidnet
. It turns out that the confidnet
is easy to be overfittting. Please give me some suggestions about how to finetune the confidnet
, I really appreciate it. Thanks.
Did you re-implement from scratch? If so, be careful in Pytorch that your feature extractor layers are indeed set to require_grad=False
during training, as does the function freeze_layers()
in the SelfConfidLearner
class. Plus, also deactivate dropout layers to avoid unwanted stochastic effects.
only finetuning the last fully connection layer in confidnet
In this implementation, ConfidNet is made of 5 fc layers added upon the penultimate layer of the original model. If you using only 1 fc layer for ConfidNet that may explain the drop in confidence estimation.
During test, the true class probility is also around 0.9 for all samples which are incorrect?
Using True Class Probability (TCP) as confidence measure, your missclassified samples should rather have low values such as in the figure of the paper: If you don't have this kind of figure for TCP, you may have a problem in your code. This will certainly affect ConfidNet training as TCP is the target value during confidence training. Regarding ConfidNet figure, it won't be as good as TCP for sure but it should be something in-between TCP figure and MCP figure.
@chcorbi Thanks for your reply.
1. Actually, I re-implement it by refering your code. I add another 5 fc layers, freeze all layers and deactive dropout layer in the feature extractor. However, I cannot train a good ConfidNet
which should generate the similar distribution figure as yours shown in the above.
2. I have attempted your code for many times in one week as following:
Step 1 python3 train.py -c confs/exp_cifar10.yaml -f
lr
to 0.05; random_crop:32
;multi_step lr schedule
.Step 2 python3 train.py -c confs/selfconfid_classif.yaml -f
pretrained model
from Step 1
However, I still cannot obtain the same performance as yours. I am really confused. Till to now, I can achieve the good test accuracy on cifar10 test set
, but I am still stuck on the ConfidNet
.
Could you give me some suggestions to achieve your performance or how to obtain the similar histogram as yours on cifar10 test set
? I really appreciate your kind help!
3. Morever, I try to use your pretrained model on cifar10 dataset
from here to draw the following distribution figures. It seems your pretrained model has the same problem.
The left figure is Maximum class probability
and the right figure corresponds to True class probability
. Could you please verify it or share the code for drawing histogram as yours?
Thanks a lot.
Did your draw the previous TCP figure by using the ground truth? Did ConfidNet really learn how to predict TCP on the test set? Thanks.
The distribution plot presented in the paper corresponds to a comparison between MCP
and the criterion TCP
. ConfidNet
is trained to match that TCP
criterion on the training dataset. According to the results obtained, when drawing the plot associated to ConfidNet
, you will find something between MCP
plot and TCP
plot, actually closer to MCP
indeed.
Your plot seems accurate, comparing here MCP
and ConfidNet
. The error distribution have been slightly shifted to lower values while keeping success prediction to high values. If you measure the AP_errors
, you can find that ConfidNet
improves over MCP
.
To help visualize, I added a notebook to plot success/errors histogram in commit https://github.com/valeoai/ConfidNet/commit/e94bd89c54df4e135626bdb485cb4467bc4048e9
@chcorbi Thanks for your help. Close this issue since I am clear now.
Hi, thanks for your really interesting work. I try to run your code for training
vgg16
oncifar10
dataset. However, the final accuracy on test set is not as good as yours. My best performance for classification oncifar10 test set
can only achieve89.32%
with the same configuration as yours. Could you give me some help for improving this accuracy?Moreover, I tried to modify your implementation of
vgg16
and tuned thelr
andweight decay
to achieve the same accuracy on test set; However, by using this pretrained model to finetuneconfidnet
, I still cannot get the similarFPR-95%-TPR AUPR-Error AUPR-Success AUC
as yours. Why?