Open Christoforos00 opened 2 years ago
Hi Christoforos,
Thanks for your interest! Glad you liked the paper.
To create the different training, validation, and test files, I used a script from another repository of mine, which you can find here prepate-intent-dataset.py
In short, you need to have a full.jsonl
file, containing all annotated samples, each row being a dictionary having a "label"
key.
Then, this script will separate labels found in this full.jsonl
file into three sets
labels.train.txt
labels.valid.txt
labels.test.txt
To create the train.10samples.jsonl
file (corresponding to the low
data profile), once you have your labels.train.txt
, for each of those labels, you need to gather 10 random samples. Unfortunately, I can't find the script for this part, but that should not be too complicated.
Hope that answers your question!
Great, thank you for your response, I will try the steps you mentioned.
Also, were the contents of the folders 01, 02, 03, 04, 05 inside BANKING77/few_shot created just by running prepate-intent-dataset.py 5 times?
Yes they are. You might want to fix the seed 5 times (e.g. seed={1, ..., 5}), so that you can reproduce the results if you lose the files.
Hello,
Thank you for your great paper and repo! I'd like to know the steps that I will need to follow to bring a new dataset in the template of your datasets. For example, how are all the files and folders in the ProtAugment/data/BANKING77 generated from the original dataset?
Thank you.