grammarly / gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite" (BEA-20) and "Text Simplification by Tagging" (BEA-21)
Apache License 2.0
894 stars 216 forks source link

delimeters in preprocess_data #127

Closed lishengfever closed 2 years ago

lishengfever commented 3 years ago

https://github.com/grammarly/gector/blob/79b6af4d54a1e5b3270866c9d5c9c7d612ee30df/utils/preprocess_data.py#L103

Space is not a Chinese separator, so this line of code will make mistakes when processing Chinese error correction tasks.

Change it to if del_val in sent and del_val != delimeters['tokens']: be more accurate.

skurzhanskyi commented 2 years ago

Thanks for reporting this! Fixed in https://github.com/grammarly/gector/commit/b9a94ed5d800d61c223fdb0e945c0ed145d400bc