tdopierre / ProtAugment

Code for ProtAugment: Unsupervised diverse short-texts paraphrasing for intent detection meta-learning
Apache License 2.0
21 stars 13 forks source link

How do you split train/val/test? #5

Closed jzhang38 closed 2 years ago

jzhang38 commented 2 years ago

Hi, thanks for your work and I find it really interesting.

I am new to this domain. I notice that other works using HWU/Clinic often use a supervised learning set-up(or 10-shots across the entire datasets). But in your work you split the dataset into train/val/test with mutual exclusive label sets, following a meta learning setup. May I ask if you are the first to do this? Or there are other works that did the similar things?

Thank you and looking forward to your reply!

jzhang38 commented 2 years ago

In other words, for the table 1 in your paper, are you the first one to do this, or there is a line of work that all split data this way to perform meta learning?

tdopierre commented 2 years ago

Hi,

Glad you enjoyed the work!

Indeed, I split the label set of each dataset into 3: train, validation, and test This follows the meta-learning scenario, where models are evaluated on their ability to learn. To simulate this scenario, training and testing label sets are disjoint.

This splitting is common in the meta-learning field, see [1] (especially Fig. 1)

However, I have not found prior work on meta-learning on those specific datasets (i.e. HWU / Clinc), hence I did the splitting myself.

Hope that answers your questions!

[1] https://lilianweng.github.io/posts/2018-11-30-meta-learning/