Closed jmreymond closed 8 years ago
This is an issue due to PDF indexing, maybe caused by the attachment mapper plugin, not by knapsack.
Please update to the latest attachment mapper plugin version, if possible, or open an issue there.
If this is a memory problem, consider switching off replicas for the knapsack import (the issue happens while replica processing).
The from index is correct and is indexed by elasticsearch. I can remove the document but no name is given and furthermore, very difficult to remove
As said, this is not an error caused by knapsack plugin, but will occur when processing the PDF.
I copy an index in the local cluster and the process stopped with message
[2015-09-28 08:26:00,008][WARN ][org.apache.pdfbox.pdfparser.PDFParser] Parsing Error, Skipping Object java.io.IOException: Push back buffer is full at java.io.PushbackInputStream.unread(PushbackInputStream.java:232) at org.apache.pdfbox.io.PushBackInputStream.unread(PushBackInputStream.java:143) at org.apache.pdfbox.io.PushBackInputStream.unread(PushBackInputStream.java:132) at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:572) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:203) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1239) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1204) at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:133) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.Tika.parseToString(Tika.java:506) at org.elasticsearch.index.mapper.attachment.AttachmentMapper.parse(AttachmentMapper.java:446) at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706) at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544) at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493) at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:493) at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:409) at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574) at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Is it possible to catch the error and continue ?