jaiminpan / pg_jieba

Postgresql full-text search extension for chinese
BSD 3-Clause "New" or "Revised" License
338 stars 65 forks source link

rel_v1.0.1分支,返回分词词性有问题 #18

Closed haopingpang closed 5 years ago

haopingpang commented 6 years ago

关于查询 分词词性, eg: select from ts_debug('jiebacfg','中央景城A区') ; 查出来全部是n(名词)词性,查看源码pg_jieba.c文件,发现在下面函数中 type = JB_N;,也就是type一直是2,所以所有词性返回的都是名词,但是注释掉 type = JB_N;取消 type = (int)(curr->attr)[0];的注释,重新编译后,在pg中再次测试 select from to_tsvector('jiebacfg', '小明硕士毕业于中国科学院计算所,后在日本京都大学深造'); 返回结果是空 //相关函数 Datum jieba_gettoken(PG_FUNCTION_ARGS) { ParserState pst = (ParserState ) PG_GETARG_POINTER(0); char t = (char ) PG_GETARG_POINTER(1); int tlen = (int ) PG_GETARG_POINTER(2); int type = -1; JiebaResult curr = Jieba_GetNext(pst->ctx); / already done the work, or no sentence / if (curr == NULL) { tlen = 0; type = 0; PG_RETURN_INT32(type); } //type = (int)(curr->attr)[0]; type = JB_N; tlen = curr->len; t = curr->str;

PG_RETURN_INT32(type);

} 另 master (c++版)编译不成功

jaiminpan commented 6 years ago

关于词性问题,因为刚开始开发的时候依赖的 jieba C版本并不支持词性。现在cpp jieba版本已经支持,很欢迎你为PG分词插件项目添加这个功能。 对于master版本的编译问题,README中有写注意事项,请你确认编译器支持C++11 The master branch require C++11(gcc4.8+), because the new version of cppjieba upgrade to C++11. If the OS compiler did not support C++11, please try old version of pg_jieba like v1.0.1

jaiminpan commented 5 years ago

最新的master已支词性