Can be used for predicting test set without labels ?

feng-1985 commented 5 years ago

For the practical application, how to use it predict the test set without labels ?

feng-1985 commented 5 years ago

In the eval step, I use the support label, and compare the real query label, but in this time, the accuracy is very low ? so, the model does't learn the prototype of classes ? Did i missed somthing ?

fourseven471 commented 5 years ago

In the eval step, I use the support label, and compare the real query label, but in this time, the accuracy is very low ? so, the model does't learn the prototype of classes ? Did i missed somthing ?

In application, test data don't have lable, so how to get support set and query set?

gaotianyu1350 commented 5 years ago

In a real application, you need to provide a few numbers of supporting instances for each relation (like 5 or 10 for each relation), then you can perform inference for query instances without labels. This code is for model evaluation on FewRel, so we did not implement the real application part.

feng-1985 commented 5 years ago

@gaotianyu1350 Thanks for you response. Maybe I am confused the application of fewshot, in this fewshot setting of relation classification, the validation metric is very high, seems it's very effect method (such as prototypical method).

But in real application scenario, there are many labels (such as 20 labels), and some labels just have a few instances (such as 5 instances), then i create the support set using this 20 labels and each choose 5 or 20 examples, for a query set (there is no corresponding labels, just know the label is in these 20 labels), i use the model to predict each instances. In this kind of evalidation, the metric is very low. so, why is it ? Thanks again.

feng-1985 commented 5 years ago

@fourseven471 In application, test data don't have lable, so how to get support set and query set? In few shot setting, the support set and query set all come from this test data. In application, we can use the train data to create support set used as the support set of test data, and we predict the label of query set from this support set.

gaotianyu1350 commented 5 years ago

@bifeng It is understandable that the accuracy drops with the number of classes increasing, and I think few-shot with large numbers of relation types will be the next trend in this area. Since there are still challenges not solved in the simple version of few-shot, it is meaningful to evaluate models in those settings.

Besides, I have done experiments on 20-way few-shot and the result isn't very bad (20-way 5-shot around 70% with prototypical networks )

feng-1985 commented 5 years ago

@gaotianyu1350 Thanks very much! I am just stumped by this scenario.

There are some doubts: in you experiments, the 20-way few-shot setting, the query set and support set is from the same validation set ? If the support set is come from the combination of training set and itself support set, then this accuracy is very low (😢 ), because it need to find the query's label from a larger support set.

The few shot with large numbers of relation types is the need of real application (any relevant reference about application of few shot is very thankfor). (why the standard few shot always use sorts of 5-way 1-shot, is there any useful to real application?)

gaotianyu1350 commented 5 years ago

Do you mean that the possible labels for your scenario are 64 relations from the training set and 20 new relations? That will make the score really low. If that is what you mean, I agree that this is a version of few-shot closer to real-world application and actually I have a paper under review that is about this topic.

Yet that does not mean that classic 5-way few-shot is meaningless. Actually, there are many machine learning tasks do not fit in real-world applications at all, but they can still benefit research in these areas. Though 5-way few-shot is a toy case of few-shot, it deserves thorough explorations since those techniques working on 5-way can also be adapted to real-world applications. Another reason is that those very first few-shot models (on CV) are evaluated on 5-way 5-shot, 5-way-1-shot ... , for better comparison, those following works all adopt these settings.

feng-1985 commented 5 years ago

Thanks !!!! Looking forward to you paper ! If there are any relevant public papers, thanks for share it.

fourseven471 commented 5 years ago

I use the same way as you. The support set comes from train set. For validation, the validation set is as query set, train set is as support set. For test, the test set without label is as query set, train set is as support set. Again, the results are very poor.

gaotianyu1350 commented 5 years ago

For validation, both the supporting set and the query set should come from the validation dataset. The same for the test phase. Note that the train, val and test datasets DO NOT share relations types.

fourseven471 commented 5 years ago

For validation, both the supporting set and the query set should come from the validation dataset. The same for the test phase. Note that the train, val and test datasets DO NOT share relations types.

But in real application, test set does not have label and we do not know the test instances belong to which label set. So the N way must be the size of whole relations.

gaotianyu1350 commented 5 years ago

About this question, you can refer to my above response:

Do you mean that the possible labels for your scenario are 64 relations from the training set and 20 new relations? That will make the score really low. If that is what you mean, I agree that this is a version of few-shot closer to real-world application and actually I have a paper under review that is about this topic.

Yet that does not mean that classic 5-way few-shot is meaningless. Actually, there are many machine learning tasks do not fit in real-world applications at all, but they can still benefit research in these areas. Though 5-way few-shot is a toy case of few-shot, it deserves thorough explorations since those techniques working on 5-way can also be adapted to real-world applications. Another reason is that those very first few-shot models (on CV) are evaluated on 5-way 5-shot, 5-way-1-shot ... , for better comparison, those following works all adopt these settings.

thunlp / FewRel

Can be used for predicting test set without labels ? #9