grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
Apache License 2.0
891 stars 216 forks source link

stage 2 training problem #180

Open ruiyeNLP opened 1 year ago

ruiyeNLP commented 1 year ago

In stage 2, four datasets are trained as described in the paper. Maybe it is a stupid question: will the four datasets be trained all together, or will be trained one by one? If they are trained all together, can the 'cat' command simply be used? Looking forward to your reply.

gotutiyan commented 1 year ago

Typically, the four datasets are used together (even if GECToR or some seq2seq models). If the datasets are used one by one, it would be 4-stages training. However, given that there is no such description in the GECToR paper, it would be natural to use them together.

skurzhanskyi commented 1 year ago

@gotutiyan is right. We used all four datasets together by mixing them (concatenating and shuffling).