google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.07k stars 1.16k forks source link

Zero Width Joiner issue for Sinhala Language #1031

Open Nadil-K opened 2 months ago

Nadil-K commented 2 months ago

Even though it seems that this issue is resolved with #629, I still encounter zero width joiner being replaced with whitespace for Sinhala Language. Any solutions for that?