Closed tarangchikhalia closed 4 months ago
How is the indexer run ? Was this initial or incremental reindex ? Is the directory in question part of some repository ?
This is an incremental reindex. The directory is part of a repository which is copied from the remote server to the opengrok server (No SCM) but I have seen this error in many git repositories.
Can you raise indexer log level to FINER
or higher and post the logs around the log entries that start with Starting file collection
and such for a case which encounters the directory problem ? This line and any subsequent lines that contain DefaultIndexChangedListener
would help.
Here are the logs with FINEST settings.
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase logIgnoredUid [373/1934]
FINEST: ignoring deleted document for '/<project>/version.json' at 20240106111117766
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.DefaultIndexChangedListener fileRemove
FINE: Remove: '/<project>/version.json'
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.DefaultIndexChangedListener fileRemoved
FINER: Removed: '/<project>/version.json'
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.util.Statistics logIt
INFO: Done file collection for directory '/<project>' (took 15 ms)
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase update
INFO: Starting indexing of directory '/<project>'
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase lambda$indexParallel$4
WARNING: ERROR addFile(): '/var/opt/opengrok/<dir_path>'
java.io.FileNotFoundException: /var/opt/opengrok/<dir_path> (Is a directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.<init>(FileInputStream.java:157)
at org.opengrok.indexer.index.IndexDatabase.getAnalyzerFor(IndexDatabase.java:1217)
at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:1129)
at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$4(IndexDatabase.java:1781)
at java.base/java.util.stream.Collectors.lambda$groupingByConcurrent$59(Collectors.java:1304)
at java.base/java.util.stream.ReferencePipeline.lambda$collect$1(ReferencePipeline.java:575)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:408)
at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:661)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:575)
at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$5(IndexDatabase.java:1770)
at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1448)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Jan 11, 2024 3:21:46 PM org.opengrok.indexer.index.IndexDatabase lambda$indexParallel$4
Can you also provide the line that contains Starting file collection
?
I went through the related code in IndexDatabase
and for the initial reindex I don't see a way there can be an entry in the IndexDownArgs
that would correspond to a directory. The indexDown()
recursive function that is executed when reindexing from scratch (or when history based reindex is off for some reason) traverses the directory tree like this: https://github.com/oracle/opengrok/blob/b2383942c7ea3e938f62f66521a19ce61293b0a5/opengrok-indexer/src/main/java/org/opengrok/indexer/index/IndexDatabase.java#L1629-L1641
The accept()
call detects any allowed symlinks. The isDirectory()
follows symlinks so even if the file
is forbidden symlink, it will be still processed in the else branch as a directory, i.e. the indexDown()
will recursively descend into that directory. The IndexDownArgs
is modified (within this code path) only in the processFile()
method and this method is always called for non-directory entries.
The IndexDownArgs
is further modified in processTrailingTerms()
from within update()
however that only happens for pre-existing index documents.
The history based reindex (which is always non-initial) that is done in indexDownUsingHistory()
is different story. There the accept()
call that identifies allowed symlinks is not used so it could happen that processFileIncremental()
which is the work horse for this indexing mode actually adds an IndexDownArgs
entry that is a directory. For Git specifically, I don't think there is a way for the Git file tree traversal could contain directories (since in Git a directory can be added to the Git index only if non-empty) however if the entry is a symlink pointing to a directory, that is possible.
That's why I asked about the Starting file collection
log entry so that I can see for which indexing mode this happens.
Sorry for the delay. The project that was encountering this issue isn't showing it now. I am trying to reproduce it in a test environment.
Sorry for the delay. The project that was encountering this issue isn't showing it now. I am trying to reproduce it in a test environment.
It definitely depends on the changes done since the last reindex. For history based reindex that would be the file trees in the newly added changesets.
The OpenGrok indexer is throwing FileNotFoundException on some directories while indexing.
OpenGrok version: 1.12.12 Tomcat: 10.1.x JDK: 11 OS: Oracle Linux 8.8