grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
Apache License 2.0
891 stars 216 forks source link

Data/output Structure #190

Open saramoeini20 opened 1 year ago

saramoeini20 commented 1 year ago

Hey, I found out the format of Target and source file from previous issues. but what would be the format of output after preprocess? is it something like tokenizing inputs? And if I want to create dataset for training should it be in source/target file format or M2 format? which one is helpful?