Closed saramoeini20 closed 1 year ago
Hi! The parallel file format is used to train models with the data. The M2 format is needed to evaluate the output of system&models. Please refer to https://github.com/nusnlp/m2scorer for the M2 format!
So for training the model regardless of which approach we select, we need parallel file format? And by saying human-annotated data we mean M2 format?
And because I want to do GEC for a low resource language, I should create dataset myself. So for reaching something like M2 format what should i do? I mean i saw something like Errant but it was for English. how have you done that for your language? Should i modify Errant?
I'm not quite sure if I got what you mean. Is it correct that you want the following?
If this is true, here are my answers:
It helped me. Thank you so much.
Since it seems like it's solved, changing the status to closed!
Hi, I have a question regarding training and test data. Actually i have seen both M2 format and parallel file format for GEC tasks. Can you please guide me that which format is used in which situation?