jacksonllee / pycantonese

Cantonese Linguistics and NLP
https://pycantonese.org
MIT License
354 stars 38 forks source link

Where are all the profanities? #14

Closed killvung closed 7 years ago

killvung commented 7 years ago

I tried to look up the 門氏五虎將, or some other phases like 仆街 or "Collect skin", but none of these are available.

Should I implement them?

Thank you

jacksonllee commented 7 years ago

The current built-in corpus data in PyCantonese is not a wordlist or dictionary, but the HKCanCor conversational data. So if what you're after isn't found in HKCanCor, then it's simply not there. Let's hope there will be more Cantonese text data available (especially the more informal kind -- for what you're looking for :) ) with the appropriate license terms for open-source usage!

killvung commented 7 years ago

Wait! You don't need to have an open-source usage just for Cantonese profanity specifically right? It wouldn't be difficult to add more of them in the HKCanCor...