thisandagain / washyourmouthoutwithsoap

A list of bad words in many languages.
MIT License
97 stars 20 forks source link

Support Chinese #1

Open thisandagain opened 6 years ago

thisandagain commented 6 years ago

Add support for simplified and traditional Chinese. This will require a modification to the strategy for tokenization of the input phrase.

References

https://nlp.stanford.edu/software/segmenter.shtml https://github.com/yishn/chinese-tokenizer