jprante / elasticsearch-analysis-decompound

Decompounding Plugin for Elasticsearch
GNU General Public License v2.0
87 stars 38 forks source link

java.lang.NumberFormatException: For input string: "" #2

Closed vlopato closed 11 years ago

vlopato commented 11 years ago

Hi!

Thanks for your plugin.

Sometime I get exception:

[2013-06-06 16:57:49,918][DEBUG][action.bulk              ] [Quantum] [2] failed to execute bulk item (index) index 
java.lang.NumberFormatException: For input string: "" 
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:504)
    at java.lang.Integer.parseInt(Integer.java:527)
    at org.elasticsearch.analysis.decompound.Decompounder.reduceToBaseForm(Decompounder.java:223)
    at org.elasticsearch.analysis.decompound.Decompounder.decompound(Decompounder.java:61)
    at org.elasticsearch.index.analysis.DecompoundTokenFilter.decompound(DecompoundTokenFilter.java:68)
    at org.elasticsearch.index.analysis.DecompoundTokenFilter.incrementToken(DecompoundTokenFilter.java:55)
    at org.apache.lucene.analysis.miscellaneous.UniqueTokenFilter.incrementToken(UniqueTokenFilter.java:55)
    at org.apache.lucene.analysis.de.GermanNormalizationFilter.incrementToken(GermanNormalizationFilter.java:57)
    at org.elasticsearch.common.lucene.all.AllTokenStream.incrementToken(AllTokenStream.java:57)
    at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:202)
    at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:278)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2328)
    at org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:583)
    at org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:489)
    at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:330)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:158)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:431)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)


Tried to debug your module but can't find anything. It happens from time to time (when I try to make bulk reindex, 50-100 docs per time).

eg: first time it crashes but second time it works correctly with same data.

Do you have any thoughts about the problem?

Thanks a lot anyway!

jprante commented 11 years ago

Thanks! Yes, a safeguard is missing in the reduceToBaseForm() method, in case Lucene emits null length strings (which can happen with filters like UnqiueToken). I will add a fix asap.

jippi commented 11 years ago

Whats this status on this? I'm also seeing the problem

jprante commented 11 years ago

Closed by 1ac2ca7d9f43a10be976a07707f0bc3248cbab55