Closed YipingNUS closed 4 years ago
Hi @YipingNUS ,
Thanks for your interest in our paper! I am glad that you were able to reproduce our results.
I suppose you are referring to the English model trained on the news dataset that we provided which contains about 100K news documents and 327 labels. The result you observe is not very surprising because the model has been trained on a relatively small number of documents and labels and, therefore, it is very difficult to do well in a zero-shot setting.
In fact, the news classification model has been only evaluated on low-resource labels but not unseen ones so there is no guarantee that it should work in the latter setting. To obtain better zero-shot performance you can use the model trained on 6.7M scientific documents and 26K labels but please note that the domain is different.
Generally, if you would like better results in the news domain I would recommend the following:
I agree that in the particular model that you tested it is hard to generalize to unseen labels for the reasons highlighted above. The paper you refer to indeed also makes use of label descriptions but is applicable only on datasets with a small number of labels and their notion of "zero-shot" is rather limited to predicting a single sentiment label given four sentiment labels seen during training (while we target thousands of unseen labels with elaborate descriptions); hence, it is unclear whether their model would scale on large label sets. I suppose that they devised the adversarial objective to cope with the bias caused by the small number of training examples and labels (~20K, ~5K).
Closing the issue for now. Feel free to re-open it if you have any further questions.
Hi @nik0spapp, I'm able to retrain the model for general categories and reproduce the result in the paper. The model seems to be giving reasonable predictions for seen categories. However, when I tried it on unseen labels, the accuracy seems very poor. Below is the model's prediction for the following news article (I modified the code to predict for new documents and arbitrary label, but I didn't touch the architecture and weights):
https://www.reuters.com/article/us-italy-art-klimt/italian-police-think-stolen-klimt-masterpiece-found-hidden-behind-ivy-idUSKBN1YF14I
The article is clearly about art. What I observed is as follows:
Do you think I'll get better results if I train on specific labels since some labels might be closer to the unseen ones? I have the concern that despite using the input-label embedding, the model is learning something category-specific, making it unable to generalize. It also reminds me of the work below where they added adversarial training to remove category-specific information from the model.
https://github.com/WHUIR/DAZER