hightman / scws

开源免费的简易中文分词系统,PHP分词的上乘之选!
http://www.xunsearch.com/scws/
Other
1.65k stars 348 forks source link

No word segmentation #52

Closed donnekgit closed 6 years ago

donnekgit commented 6 years ago

I've downloaded and compiled scws (Ubuntu 14.04).

If I run it at the commandline, I get no segmentation:

$ scws -c utf8 '她令人紧张不 安。'
她  令  人  紧  张  不  安 。

instead of what I would expect (and which I get on the webdemo):

她  令人  紧张  不  安 。

Do I need to do anything else to configure scws?

hightman commented 6 years ago

you should specify dictionary file bypass option ‘-d’

发自我的 iPhone6艹

在 2018年3月10日,上午5:41,Kevin Donnelly notifications@github.com 写道:

I've downloaded and compiled scws (Ubuntu 14.04).

If I run it at the commandline, I get no segmentation:

$ scws -c utf8 '她令人紧张不 安。' 她 令 人 紧 张 不 安 。 instead of what I would expect (and which I get on the webdemo): 她 令人 紧张 不 安 。

Do I need to do anything else to configure scws?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

donnekgit commented 6 years ago
$ scws -c utf8 -d '她令人紧张不 安。'
WARNING: failed to add dict file: 她令人紧张不 安。

Do I need to install the dictionary separately?

donnekgit commented 6 years ago

In fact, the dictionary doesn't seem to be in the scws-1.2.3 download. What you have to do is (for the utf8 dictionary):

wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
tar xvjf scws-dict-chs-utf8.tar.bz2

as set out here. Then:

$ scws -c utf8 -d ../dict.utf8.xdb '她令人紧张不安。'

gives the expected output.