Closed Yuchong-Geng closed 3 years ago
Hi, This depends upon what you used as a threshold score for F1 score computation. Can you please share more details, so that we can help you out?
Thanks
Thanks for the response.
So I am trying to run the cifar experiment as follow:
python3 main_cifar.py --lamda 1 --radius 8 --lr 0.001 --gamma 1 --ascent_step_size 0.001 --batch_size 256 --epochs 100 --optim 0 --normal_class 0
And instead of using the default "AUC" metric, I am using the F1 score that is defined in the DROCCTrainer.
thresh = np.percentile(scores, 20)
y_pred = np.where(scores >= thresh, 1, 0)
prec, recall, test_metric, _ = precision_recall_fscore_support(
labels, y_pred, average="binary")
Hi, We kept this threshold for F1 score mainly for tabular experiment results (since previous work in that domain considers F1 score as a metric at a specific threshold). For images, we would suggest you look at AUC scores only since it does not consider just one specific threshold but gives an overall picture from various thresholds. Moreover, this is in line with the previous works on images.
Thanks
Hi @Yuchong-Geng ,
Can I close the issue now?
Sure. Thanks for the help!
Hi,
I have a question about using F1 score as metric for cifar-10 instead of AUC. When I use the exemplar parameters to train the model, the AUC can go up to around 76% but the F1 score stays around 20 for the entire epcohs.
Can you please provide some insights into this problem.
I appreciate your time and help.