rkcosmos / deepcut

A Thai word tokenization library using Deep Neural Network
MIT License
420 stars 96 forks source link

Problems with numbers #72

Open watkru opened 3 years ago

watkru commented 3 years ago

Numbers with comma are tokenized separately by comma, for example, 2,000 is tokenized as '2', ',' and '000'