grantjenks / python-wordsegment

English word segmentation, written in pure-Python, and based on a trillion-word corpus.
http://www.grantjenks.com/docs/wordsegment/
Other
365 stars 49 forks source link

Added ability to easier load custom corpuses #11

Closed bgbg closed 7 years ago

bgbg commented 7 years ago

To use a different than default corpus, a series of actions need to be performed (as described here). This change automates this process without changing the existing API.

grantjenks commented 7 years ago

Out of curiosity, how often do you use a different corpus? And if I may ask, what other corpus do you use?

bgbg commented 7 years ago

Possible scenarios: corpuses based on different potential audience, different languages.

grantjenks commented 7 years ago

Superseded by v1.1.3 or greater which is deployed on PyPI.

Do you use wordsegment as part of your work at Automattic? I would love to include a testimonial from you if the package has been useful.

bgbg commented 7 years ago

I used this library for an ad-hoc analysis but decided not to adopt another approach.