grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
Apache License 2.0
891 stars 216 forks source link

bash file as example #176

Closed ruiyeNLP closed 1 year ago

ruiyeNLP commented 1 year ago

Hi all, thanks for your excellent work again. Could please add a bash file or an example to show how to train GECToR? I went through your paper but remain unsure about how to train the GECToR. The main thing I am unclear about is the three stages for training with many datasets in different formats involved. It would help a lot if you could add an example or a bash file for one whole training pipeline. Looking forward to your reply.

mughal41 commented 1 year ago

Hi @ruiyeNLP Please have a look at the project's README section if you want to reproduce their results as stated in their paper.

To start training the model these are the steps you should follow:

  1. Gather the data for the first stage could be found here as it was mentioned in project's README --> the Dataset section.
  2. You'll convert the m2 format file into 2 parallel files, i guess it'll generate something like a corr_sent.txt and a incorr_sent.txt
  3. Now that u have generated 2 parallel files for both train and dev sets, using the project's pre-processing script described in the README, you have to generate train_set.txt and dev_set.txt
  4. Now load these files up into train.py and train your model following these params
skurzhanskyi commented 1 year ago

Thank you, @mughal41. You are absolutely right

Lj4040 commented 1 year ago

@mughal41 I would like to ask you how to convert this m2 format file into 2 parallel files. I would like to ask for your help.Was it generated from the error.py file?