google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
Apache License 2.0
10.31k stars 1.18k forks source link

Zero Width Joiner issue for Sinhala Language #1031

Open Nadil-K opened 5 months ago

Nadil-K commented 5 months ago

Even though it seems that this issue is resolved with #629, I still encounter zero width joiner being replaced with whitespace for Sinhala Language. Any solutions for that?