zzmjohn / mmseg4j

Automatically exported from code.google.com/p/mmseg4j
Apache License 2.0
0 stars 0 forks source link

希望增加对中英文或中文和数字混合的词汇分词 #35

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
在实际分词中,会遇到“比亚迪F3”,"马自达2","蜘蛛侠3","7��
�连锁酒店"等这些中英文混合或中文数字混合的词汇,我尝试
将这些词加入到词库,但分词结果还是会将字母和数字单独��
�出来。

不知道在哪个版面能实现?谢谢

Original issue reported on code.google.com by sens...@gmail.com on 3 Oct 2012 at 3:56

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
其实还有其他组合希腊字母和中文组合词,如γ射线,α同位�
��,β版;还有需要保留标点的情况,如《那些年,我们一起�
��的女孩儿》,《手机》

Original comment by sens...@gmail.com on 3 Oct 2012 at 4:09

GoogleCodeExporter commented 9 years ago
我也遇到了同样的问题。比如T恤,希望可以支持。

Original comment by schum...@gmail.com on 19 Apr 2013 at 6:44

GoogleCodeExporter commented 9 years ago
我也有这个问题,希望支持

Original comment by sling...@gmail.com on 23 May 2013 at 8:44

GoogleCodeExporter commented 9 years ago
我也遇到了同样的问题,希望可以支持中英文混合分词啊!��
�!!!!!!

Original comment by 10674427...@gmail.com on 1 Apr 2014 at 2:18

GoogleCodeExporter commented 9 years ago
针对词库的挖词程序。可以开发这个程序。

Original comment by chenlb2...@gmail.com on 19 May 2014 at 11:48