yaoxingcheng / TLM

ICML'2022: NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework
MIT License
256 stars 21 forks source link

Could you provide selected data for each dataset? #2

Closed swj0419 closed 2 years ago

swj0419 commented 2 years ago

Great work! I was wondering if the selected data for each dataset will be available soon. Thanks

yaoxingcheng commented 2 years ago

Thank you for your interesting in our work. All task datasets and corresponding selected data have been uploaded to Huggingface. Take AGNews for example (Links to the rest of the datasets are shown in README.md), you can find two splits called small_external.csv and large_external.csv in the uploaded files. Those are the selected data both on Wiki+Book and on Wiki+Book+CCNews+OpenWeb+Stories.

swj0419 commented 2 years ago

Thanks! Do you retrieve data for examples only in the task training set or the combination of task training, dev, and test set?

yaoxingcheng commented 2 years ago

We retrieve data only for the training set.