Different results for newgroup text classification

nazim1021 commented 5 years ago

Hi,

Thank you very much for the response. After the modifications you suggested, I was able to reproduce the results of the paper for the SST dataset.

However, there is a problem for the 20 Newsgroups dataset. When I evaluate your pretrained model, the results are way different from the ones mentioned in the paper. Actually, when we run the evaluation script for your model, the results we get are the following:

OOD dataset mean FPR: 0.6828 OOD dataset mean AUROC: 0.7115 OOD dataset mean AUPR: 0.2773

So, what could be the issue? If the evaluation datasets are the same ones as the ones you used in the SST, I cannot find a reason for which your pretrained model does not give the results mentioned in the paper. Which version of the 20 Newsgroups did you use and how did you do the train/test split? we use the newsgroup available from sklearn library

Thank you,

Originally posted by @AristotelisPap in https://github.com/hendrycks/outlier-exposure/issues/4#issuecomment-517967141

mmazeika commented 5 years ago

Hi,

We use the no-short version available at http://ana.cachopo.org/datasets-for-single-label-text-categorization. This is the bydate version, which has a standard train/test split. It also removes words shorter than 3 characters and a few bad examples (e.g. duplicate examples in the original bydate version). I uploaded this data and a reformatting script to the NLP_classification/20_newsgroups folder. Thank you for noticing this.

All the best,

Mantas

nazim1021 commented 5 years ago

Thank you so much. That helped

hendrycks / outlier-exposure

Different results for newgroup text classification #5