A few datasets are used for training: NUCLE, Lang-8, FCE, WI and LOCNESS. Do you only use the training sets, or also the development and test sets?
Noticeably, you evaluate on the BEA-2019 dev set, which includes WI and LOCNESS, so I would imagine you only train on the training sets of the datasets above?
A few datasets are used for training: NUCLE, Lang-8, FCE, WI and LOCNESS. Do you only use the training sets, or also the development and test sets?
Noticeably, you evaluate on the BEA-2019 dev set, which includes WI and LOCNESS, so I would imagine you only train on the training sets of the datasets above?
My source of confusion is from your dataset sizes and how they differ from the follow-up work: https://arxiv.org/pdf/2203.13064.pdf
It seems that you used the full FCE dataset for GECTOR, and only the FCE training set for the ensembling paper.