cross commented 5 years ago

I am running HEAD opengrok from the tree as of late last week, identifying itself as version 1.2.8.

I am seeing a error now while doing a fresh index (all data file deleted) of a set of source projects, and I think I had indexed this same set before without seeing this error. It is on the same host, and the only recent change was increasing the max heap size args from -Xms2g -Xmx12g to -Xms4g -Xmx16g. I doubt that's related, but can back that out and test again.

The error I'm seeing is

16:10:23 WARNING: ERROR addFile(): /src/myproject1/third-party/maven/org/apache/maven/reporting/maven-reporting-impl/2.3/maven-reporting-impl-2.3.jar
java.lang.IllegalStateException: TokenStream contract violation: close() call missing   
        at org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90)
        at org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:412)
        at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:202)
        at org.apache.lucene.document.Field.tokenStream(Field.java:513)
        at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:787)
        at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:430)
        at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultI
ndexingChain.java:394)
        at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(Docum
entsWriterPerThread.java:251)
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWrite
r.java:494)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1
616)
        at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1235
)
        at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:7
80)
        at org.opengrok.indexer.index.IndexDatabase.lambda$null$1(IndexDatabase.
java:1193)
        at java.base/java.util.stream.Collectors.lambda$groupingByConcurrent$59(
Collectors.java:1297)
        at java.base/java.util.stream.ReferencePipeline.lambda$collect$1(Referen
cePipeline.java:575)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachO
ps.java:183)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(A
rrayList.java:1654)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline
.java:484)
        at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.
java:290)
        at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter
.java:746)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)  
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.helpCC(ForkJoinPool.java:1115)
        at java.base/java.util.concurrent.ForkJoinPool.awaitJoin(ForkJoinPool.java:1687)
        at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:411)
        at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)  
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:575)
        at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$2(IndexDatabase.java:1182)
        at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1448)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)  
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)

Searching I found only #2030 about a failing test, but I did a mvnw unit-test run on this build, without seeing an issue. (And I think #2030 shows the test that provoked that issue was removed)

Let me know if I can provide any more information on this. I have seen these errors a few times in recent days, but am not 100% sure how to reproduce it. I have reduced my project count down to something I know I had seen working earlier.

vladak commented 5 years ago

2030 initially tracked just the test case failure, over time it became a place to store info about the `TokenStream` contract violation in general so closing as a dup.

I am not sure what is the net effect of the exception, I believe in the worst case it will just skip the contents of the file (i.e. they will not be indexed and therefore not searchable).

vladak commented 5 years ago

What would be useful to know is how often this happens. The bug seems to be tied to .jar files only. Does it happen for the same set of files (I assume the project in question contains bunch of .jar files) if you reindex from scratch (to be sure) multiple times in a row ?

cross commented 5 years ago

Okay. Well, this server is still in bring-up, so I can run that a few times. I've deleted the data directory and will start the index. I'll do that a couple times and compare the outputs.

vladak commented 5 years ago

Cool, pls update #2030 with your findings.

oracle / opengrok

new failure seen while indexing #2778

2030 initially tracked just the test case failure, over time it became a place to store info about the `TokenStream` contract violation in general so closing as a dup.

oracle / opengrok

new failure seen while indexing #2778

2030 initially tracked just the test case failure, over time it became a place to store info about the TokenStream contract violation in general so closing as a dup.

2030 initially tracked just the test case failure, over time it became a place to store info about the `TokenStream` contract violation in general so closing as a dup.