pelias / geonames

Import pipeline for geonames in to Pelias
https://pelias.io
MIT License
43 stars 37 forks source link

Index re-open failure after loading geonames #11

Closed heffergm closed 9 years ago

heffergm commented 9 years ago

Summary steps to reproduce:

Details:

[2014-10-29 16:54:31,878][DEBUG][action.bulk              ] [T-Ray] [pelias][0] failed to execute bulk item (index) index {[pelias][geoname][30466], source[{"name":{"default":"Khaysāyah"},"admin0":"Yemen","admin1":"Al Mahrah","admin2":"Huswain","center_point":{"lat":"15.63333","lon":"52.1"},"suggest":{"input":["khaysāyah"],"payload":{"id":"geoname/30466","geo":"52.1,15.63333"},"output":"Khaysāyah, Huswain, Al Mahrah"}}]}
org.elasticsearch.index.engine.IndexFailedEngineException: [pelias][0] Index failed for [geoname#30466]
    at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:499)
    at org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:409)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:446)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:535)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:434)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
    at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:111)
    at org.apache.lucene.analysis.util.CharacterUtils.readFully(CharacterUtils.java:230)
    at org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:272)
    at org.apache.lucene.analysis.util.CharacterUtils.fill(CharacterUtils.java:220)
    at org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:137)
    at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
    at org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter.incrementToken(ASCIIFoldingFilter.java:104)
    at org.apache.lucene.analysis.pattern.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:69)
    at org.apache.lucene.analysis.synonym.SynonymFilter.parse(SynonymFilter.java:358)
    at org.apache.lucene.analysis.synonym.SynonymFilter.incrementToken(SynonymFilter.java:624)
    at org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:237)
    at org.apache.lucene.analysis.PrefixAnalyzer$PrefixTokenFilter.incrementToken(PrefixAnalyzer.java:109)
    at org.apache.lucene.analysis.PrefixAnalyzer$PrefixTokenFilter.incrementToken(PrefixAnalyzer.java:109)
    at org.apache.lucene.analysis.TokenStreamToAutomaton.toAutomaton(TokenStreamToAutomaton.java:122)
    at org.apache.lucene.search.suggest.analyzing.XAnalyzingSuggester.toFiniteStrings(XAnalyzingSuggester.java:901)
    at org.elasticsearch.search.suggest.completion.AnalyzingCompletionLookupProvider.toFiniteStrings(AnalyzingCompletionLookupProvider.java:371)
    at org.elasticsearch.search.suggest.completion.CompletionTokenStream.incrementToken(CompletionTokenStream.java:63)
    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:604)
    at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539)
    at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1254)
    at org.elasticsearch.index.engine.internal.InternalEngine.innerIndex(InternalEngine.java:563)
    at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:492)
    ... 8 more
   { index:
      { _index: 'pelias',
        _type: 'geoname',
        _id: '30467',
        status: 500,
        error: 'IndexFailedEngineException[[pelias][1] Index failed for [geoname#30467]]; nested: IllegalStateException[TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.]; ' } } }
{ [Error: unknown response code]
  message: 'unknown response code',
  req:
   { name: { default: 'Ra’s Ḑarbat ‘Alī' },
     admin0: 'Yemen',
     admin1: 'Al Mahrah',
     admin2: 'Hawf',
     center_point: { lat: '16.63333', lon: '53' },
     suggest:
      { input: [Object],
        payload: [Object],
        output: 'Ra’s Ḑarbat ‘Alī, Hawf, Al Mahrah' } },
  res:
   { index:
      { _index: 'pelias',
        _type: 'geoname',
        _id: '30468',
        status: 500,
        error: 'IndexFailedEngineException[[pelias][2] Index failed for [geoname#30468]]; nested: IllegalStateException[TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.]; ' } } }
{ [Error: unknown response code]
  message: 'unknown response code',
  req:
   { name: { default: 'Jibāl al Qamar' },
     admin0: 'Yemen',
     admin1: 'Al Mahrah',
     admin2: 'Shahan',
     center_point: { lat: '16.83333', lon: '53' },
     suggest:
      { input: [Object],
        payload: [Object],
        output: 'Jibāl al Qamar, Shahan, Al Mahrah' } },
  res:
   { index:
      { _index: 'pelias',
        _type: 'geoname',
        _id: '30469',
        status: 500,
        error: 'IndexFailedEngineException[[pelias][3] Index failed for [geoname#30469]]; nested: IllegalStateException[TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.]; ' } } }

Big Issue Here

[2014-10-29 16:58:09,713][WARN ][indices.cluster          ] [Challenger] [pelias][2] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [pelias][2] failed to recover shard
    at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:269)
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.engine.IndexFailedEngineException: [pelias][2] Index failed for [geoname#30477]
    at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:499)
    at org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:769)
    at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:250)
    ... 4 more
Caused by: java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
    at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:111)
    at org.apache.lucene.analysis.util.CharacterUtils.readFully(CharacterUtils.java:230)
    at org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:272)
    at org.apache.lucene.analysis.util.CharacterUtils.fill(CharacterUtils.java:220)
    at org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:137)
    at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
    at org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter.incrementToken(ASCIIFoldingFilter.java:104)
    at org.apache.lucene.analysis.pattern.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:69)
    at org.apache.lucene.analysis.synonym.SynonymFilter.parse(SynonymFilter.java:358)
    at org.apache.lucene.analysis.synonym.SynonymFilter.incrementToken(SynonymFilter.java:624)
    at org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:237)
    at org.apache.lucene.analysis.PrefixAnalyzer$PrefixTokenFilter.incrementToken(PrefixAnalyzer.java:109)
    at org.apache.lucene.analysis.PrefixAnalyzer$PrefixTokenFilter.incrementToken(PrefixAnalyzer.java:109)
    at org.apache.lucene.analysis.TokenStreamToAutomaton.toAutomaton(TokenStreamToAutomaton.java:122)
    at org.apache.lucene.search.suggest.analyzing.XAnalyzingSuggester.toFiniteStrings(XAnalyzingSuggester.java:901)
    at org.elasticsearch.search.suggest.completion.AnalyzingCompletionLookupProvider.toFiniteStrings(AnalyzingCompletionLookupProvider.java:371)
    at org.elasticsearch.search.suggest.completion.CompletionTokenStream.incrementToken(CompletionTokenStream.java:63)
    at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:604)
    at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:342)
    at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:301)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
    at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:451)
    at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1539)
    at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1254)
    at org.elasticsearch.index.engine.internal.InternalEngine.innerIndex(InternalEngine.java:563)
    at org.elasticsearch.index.engine.internal.InternalEngine.index(InternalEngine.java:492)
    ... 6 more
[2014-10-29 16:58:09,720][WARN ][cluster.action.shard     ] [Challenger] [pelias][3] sending failed shard for [pelias][3], node[gsDJsFt2QO--0cgAju9XHA], [P], s[INITIALIZING], indexUUID [wvGLN_xWSc6kIHqjXetG6A], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[pelias][3] failed to recover shard]; nested: IndexFailedEngineException[[pelias][3] Index failed for [geoname#30470]]; nested: IllegalStateException[TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.]; ]]
missinglink commented 9 years ago

I failed to reproduce this error until I upgraded the pelias plugin to the latest version using the following command:

sudo bin/plugin -url https://github.com/pelias/elasticsearch-plugin/blob/1.3.4/pelias-analysis.zip?raw=true -install pelias-analysis

Once I upgraded I simply restarted the process and I saw this in /var/log/elasticsearch/elasticsearch.log

[2014-10-30 11:38:35,724][WARN ][cluster.action.shard     ] [Super-Nova] [pelias][3] sending failed shard for [pelias][3], node[5_nyFKfGScKUBBxIXc7a1A], [P], s[INITIALIZING], indexUUID [tZ5dxxQPT6eb258ZEJ56cw], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[pelias][3] failed to recover shard]; nested: IndexFailedEngineException[[pelias][3] Index failed for [geoname#8045005]]; nested: IllegalStateException[TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.]; ]]
[2014-10-30 11:38:35,724][WARN ][cluster.action.shard     ] [Super-Nova] [pelias][3] received shard failed for [pelias][3], node[5_nyFKfGScKUBBxIXc7a1A], [P], s[INITIALIZING], indexUUID [tZ5dxxQPT6eb258ZEJ56cw], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[pelias][3] failed to recover shard]; nested: IndexFailedEngineException[[pelias][3] Index failed for [geoname#8045005]]; nested: IllegalStateException[TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.]; ]]

@hkrishna might know more about how to solve this. My suspicion is that a java class inside the plugin is extending TokenStream and the subclass does not implement reset() or does not call super.reset() as per the following message: (I looked but I couldn't find it)

Caused by: java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
missinglink commented 9 years ago

last time I compiled it myself, so I compiled it again without pulling the repo. I'm at: commit d1214b109c8395730c64ae552b2684b0847e44c5

Confirmed, reverting to an older version of the plugin fixes the issue and the indices re-open without corruption

missinglink commented 9 years ago

Closing in favour of https://github.com/pelias/elasticsearch-plugin/issues/4