grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
Apache License 2.0
891 stars 216 forks source link

Predict @@UNKNOWN@@ during prediction #160

Closed tuzeao closed 2 years ago

tuzeao commented 2 years ago

Hi. Recently I have been working out with this fantastic model. I generated my data and trained it and made predictions. Everything seems work great. Finally when I tried check the output, this strange thing happened: In my predictions many edit operations of char is predicted as @@UNKNOWN@@, like this: image

I dont think something wrong with my training process. I generate source and target sentence, split them to two files, use bert tokenizer to tokenize them, then use preprocess to make them to correct format for train.py. Though I have only 4 types of edit operations due to apply this model in Chinese, But that's OK for my application scene.

Any ideas on how this would happen? I have checked all the issues and seems like no one has the same situation. Stucked here like two days so I will so thankful if someone gives some advice.

tuzeao commented 2 years ago

Ok I figured it out. the gap between training data and labels.txt if you add your personilized tranforms while forget adding them to the labels.txt, it happens.