Concatenation-based data augmentation always uses Task C labels

kanishk-adapt / semeval-task10

Repo for SemEval Task #10 EDOS 2023. created and maintained for DCU - ADAPT submissions

Other

0 stars 0 forks source link

Concatenation-based data augmentation always uses Task C labels #25

Closed jowagner closed 1 year ago

jowagner commented 1 year ago

Documents are always grouped per task C label when sampling documents for concatenation-based data augmentation, naively using the stored label in '%d.%d'/'none' format. We should group documents by the actual label of the target task in tasks A and B so that

more documents are available for sampling
the expected number of training items is produced

jowagner commented 1 year ago

Interestingly, at least for task B, mixing documents with different task C labels (but same task B label) harms performance.