What is special_tokens_fix doing?

grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)

Apache License 2.0

891 stars 216 forks source link

What is special_tokens_fix doing? #183

Closed suttergustavo closed 1 year ago

suttergustavo commented 1 year ago

I understand that the special_tokens_fix is adding the $START token to the vocabulary, but can someone explain why we only do that for the RoBERTa model?

skurzhanskyi commented 1 year ago

This is just a setup in which the model was trained.

suttergustavo commented 1 year ago

I see, but training from scratch is there a difference between using it or not? My question is basically what to do if I want to try a new Transformer backbone, should I use it or not?

skurzhanskyi commented 1 year ago

Yes, I would rather train it using special_tokens_fix to avoid splitting it into the subwords.

suttergustavo commented 1 year ago

Makes sense, thanks very much for the answers!