How to evaluate the `output_file` using `m2scorer` and `errant`

grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

Apache License 2.0

891 stars 216 forks source link

How to evaluate the `output_file` using `m2scorer` and `errant` #173

Closed hezy29 closed 1 year ago

hezy29 commented 1 year ago

Hi, I'm trying to do the evaluation using m2scorer, but the output_file of our model is a unique format(i.e., preprocessed format) to train the GECToR model instead of the paralleled sentences.

How can I use m2scorer in this particular circumstance to evaluate the model performance? Thanks!

hezy29 commented 1 year ago

I figure out that only the training process needs to preprocess data into the special format. The prediction process only needs to input the paralleled .src source text will do. Thanks for your work!

Lj4040 commented 1 year ago

@hezy29 Hello, is the training file for stage 2 the training file needed to convert the m2 format into two parallel files and then process the data? May I ask how do you convert M2 file data into parallel files? I really need your help. Thank you for your help，

hezy29 commented 1 year ago

Hi @Lj4040 , the codes to convert .m2 to paralleled plain text files can be found here.