Closed lishengfever closed 2 years ago
https://github.com/grammarly/gector/blob/79b6af4d54a1e5b3270866c9d5c9c7d612ee30df/utils/preprocess_data.py#L103
Space is not a Chinese separator, so this line of code will make mistakes when processing Chinese error correction tasks.
Change it to if del_val in sent and del_val != delimeters['tokens']: be more accurate.
Thanks for reporting this! Fixed in https://github.com/grammarly/gector/commit/b9a94ed5d800d61c223fdb0e945c0ed145d400bc
https://github.com/grammarly/gector/blob/79b6af4d54a1e5b3270866c9d5c9c7d612ee30df/utils/preprocess_data.py#L103
Space is not a Chinese separator, so this line of code will make mistakes when processing Chinese error correction tasks.
Change it to if del_val in sent and del_val != delimeters['tokens']: be more accurate.