Why does we need to train for Stage II and Stage III? And why not just train for one stage on the annotated dataset?

grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

Apache License 2.0

894 stars 216 forks source link

Why does we need to train for Stage II and Stage III? And why not just train for one stage on the annotated dataset? #106

Closed chengyang00 closed 3 years ago

chengyang00 commented 3 years ago

I want to know why doing this can improve the performance. Thanks!

abhinavdayal commented 3 years ago

My understanding. Stage 1 is synthetic data which is also huge in size, so training is done on that. Stage 2 and 3 use manually annotated and accurate data with the kind of errors humanly made. The data size is tiny compared to the synthetic data. Thus they call it fine tuning and not training.

skurzhanskyi commented 3 years ago

Thanks for answering this. You're right – different stages have data of different quality.