oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.33k stars 746 forks source link

do not attempt to retrieve history using the repository method for files without history cache entry for repos with history for directories #759

Open vladak opened 10 years ago

vladak commented 10 years ago

I was playing with indexing on OS X (for #687) with one of my repos and discovered that untracked files cause unhandled exceptions in the phase of generating Lucene index (from addFile()). It would be nice to catch these exceptions and generate log message with WARNING or maybe INFO level.

tarzanek commented 10 years ago

What does luke say? Were those files added? If yes, then they were added without history which is desired behaviour - and the stack goes into fine level i think ....

vladak commented 10 years ago

The actual output looks like this:

11:44:51 WARNING: Non-zero exit status 255 from command [/usr/bin/hg, log, -f, --template, changeset: {rev}:{node|short}\n{branches}{tags}{parents}\nuser: {author}\ndate: {date|isodate}\nfiles: {files}\ndescription: {desc|strip|obfuscate}\n, race/test.dat] in directory /Users/vladimirkotal: abort: cannot follow file not in parent revision: "race/test.dat"
11:44:51 WARNING: An error occurred while reading history: 
org.opensolaris.opengrok.history.HistoryException: Failed to get history for: "/private/var/tmp/base/src/unix-prog-src/race/test.dat" Exit code: 255
    at org.opensolaris.opengrok.history.MercurialHistoryParser.parse(MercurialHistoryParser.java:79)
    at org.opensolaris.opengrok.history.MercurialRepository.getHistory(MercurialRepository.java:501)
    at org.opensolaris.opengrok.history.MercurialRepository.getHistory(MercurialRepository.java:494)
    at org.opensolaris.opengrok.history.FileHistoryCache.get(FileHistoryCache.java:307)
    at org.opensolaris.opengrok.history.HistoryGuru.getHistory(HistoryGuru.java:212)
    at org.opensolaris.opengrok.history.HistoryGuru.getHistoryReader(HistoryGuru.java:175)
    at org.opensolaris.opengrok.analysis.AnalyzerGuru.populateDocument(AnalyzerGuru.java:284)
    at org.opensolaris.opengrok.index.IndexDatabase.addFile(IndexDatabase.java:591)
    at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:842)
    at org.opensolaris.opengrok.index.IndexDatabase.indexDown(IndexDatabase.java:812)
    at org.opensolaris.opengrok.index.IndexDatabase.update(IndexDatabase.java:379)
    at org.opensolaris.opengrok.index.IndexDatabase$1.run(IndexDatabase.java:168)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

I am not sure why there has to be the exception stack trace at all (for this specific case).

vladak commented 5 months ago

With the recent Mercurial (6.1.1) the hg log no longer exits with non-zero code when the file is untracked, even with -f/--follow (which current OpenGrok version does not use unless renamed support is on for given repository) so the exception is no longer reported.

That said, this is a question of efficiency. The hg log command is executed which will slow the indexer down. For such repository types, the fallback should be false when calling getHistory() to avoid it because when -H is used the history cache should be generated for such repositories (i.e. these with the capability to retrieve history for directories) in the 1st phase of indexing.

For repositories with hasHistoryForDirectories() returning false (e.g. SCCS), this has to still work because for these the history will be only stored in the 2nd phase of the indexing via the populateDocumentHistory().

vladak commented 5 months ago

Actually, the check is already being done in HistoryGuru#getHistoryFromRepository(): https://github.com/oracle/opengrok/blob/5d55b5f08d930049059a72ad2c2317584e55e94b/opengrok-indexer/src/main/java/org/opengrok/indexer/history/HistoryGuru.java#L547-L560

The issue is that the fetchHistoryWhenNotInCache configuration tunable is set to true by default so the history is retrieved also for repositories which support history per directory. Maybe the tunable should be set to false by default and that is why I am keeping this issue open.

That said, the tunable might go away as suggested in https://github.com/oracle/opengrok/issues/1498#issuecomment-299310389