waiteryee127 / mmseg4j

Automatically exported from code.google.com/p/mmseg4j
Apache License 2.0
1 stars 0 forks source link

MMSeg分词的thread-safe问题 #30

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. we use elasticsearch 
plugin:https://github.com/medcl/elasticsearch-analysis-mmseg
2. we put data to es node for index
3. exception happens(null pointer and concurrent modification) on some of the 
data, we check each field analyzed by mmseg and find no exception in 
single-thread.

The elasticsearch log is as below:
f82c878dee7","location":{"provinceId":"320000","cityId":"320500"},"address":"常
熟市虞山镇富仓路8号","trade":{"id":"336","name":"医院","parentId":"12
"},"name":"常熟三院(二级甲等)","namePinyin":[]}]}

java.util.ConcurrentModificationException

        at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)

        at java.util.AbstractList$Itr.remove(AbstractList.java:357)

        at com.chenlb.mmseg4j.rule.Rule.remainChunks(Rule.java:41)

        at com.chenlb.mmseg4j.ComplexSeg.seg(ComplexSeg.java:93)

        at com.chenlb.mmseg4j.MaxWordSeg.seg(MaxWordSeg.java:19)

        at com.chenlb.mmseg4j.MMSeg.next(MMSeg.java:179)

        at com.chenlb.mmseg4j.analysis.MMSegTokenizer.incrementToken(MMSegTokenizer.java:62)

        at org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:55)

        at org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:60)

        at org.apache.lucene.analysis.snowball.SnowballFilter.incrementToken(SnowballFilter.java:76)

        at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:141)

        at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)

        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)

        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)

        at org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:574)

        at org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:486)

        at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:323)

        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:158)

        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)

        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)

        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

[2012-08-01 15:42:28,279][DEBUG][action.bulk              ] [Aftershock] 
[ptc][2] failed to execute bulk item (index) index 
{[ptc][dial][158f4902ad9e335d976f6dbe8b1841b1], 
source[{"id":"158f4902ad9e335d976f6dbe8b1841b1","location":{"provinceId":"430000
","cityId":"430700"},"address":"常德市洞庭大道中段104号","trade":{"id"
:"336","name":"医院","parentId":"12"},"name":"常德市妇幼保健院(二级�
��等)","namePinyin":[]}]}

java.lang.NullPointerException

        at com.chenlb.mmseg4j.MMSeg.next(MMSeg.java:180)

        at com.chenlb.mmseg4j.analysis.MMSegTokenizer.incrementToken(MMSegTokenizer.java:62)

        at org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:55)

        at org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:60)

       at org.apache.lucene.analysis.snowball.SnowballFilter.incrementToken(SnowballFilter.java:76)

        at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:197)

        at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)

        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)

        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)

        at org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:574)

        at org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:486)

        at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:323)

        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:158)

        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:532)

        at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:430)

        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

Original issue reported on code.google.com by ww.wang.cs on 2 Aug 2012 at 5:06

GoogleCodeExporter commented 9 years ago
our elasticsearch version is 0.19.8

Original comment by ww.wang.cs on 2 Aug 2012 at 5:07

GoogleCodeExporter commented 9 years ago
线程不安装,使用 ThreadLocal 包装。

Original comment by chenlb2...@gmail.com on 20 Jan 2013 at 4:42