oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.33k stars 746 forks source link

OpenGrok seems caching search results after deleting index + repo content #990

Open samsongli opened 8 years ago

samsongli commented 8 years ago

I run a project using OpenGrok 0.12.1 and repeat the following steps:

  1. add new source tree to a repo
  2. let OpenGrok index it
  3. search the source
  4. delete the content of the repo along with its index
  5. create a new source tree and go to step 1

When I search the new source tree after asking OpenGrok to index it, I often get some weird results that seems from the cache of the previous source tree, which was just deleted.

I believe the research results were from the cache of either Lucene or OpenGrok if it caches search history. I tried to clear cache by either stop the OpenGrok application from Tomcat, or by restarting Tomcat, neither works fine for me.

Is there any way to ask OpenGrok or Lucene to purge the cache?

tarzanek commented 8 years ago

Isn't this a local browser cache?

Ctrl+Shift+r Or hold Shift and click refresh button ...

samsongli commented 8 years ago

I was using the JSON interface to retrieve results, actually the entire process is done programmatically, so it is not the browser cache. Should be either lucene cache or OpenGrok cache, if any

tarzanek commented 8 years ago

I think the problem is somewhere else, since after reindex OpenGrok will(should) drop old lucene documents and this is the only source of hits then only webapp server(tomcat) or client hold caches, but they do this only for xrefs, search results should be dynamic always, so restarting webapp server might also help (but shouldn't be needed!) so unless we have a bug in this document deletion I doubt we do some other cache anywhere else naturally there can be bugs in the json client (I personally never tested it, so not sure of its shape) I will try to reproduce this, it might be with lucene upgrades such deletion bug crept in somewhere ...

samsongli commented 8 years ago

Currently I am restarting Tomcat each time I delete the document and index repo, which helps mitigate the problem. I tried to restart OpenGrok webapp within tomcat (without stopping tomcat itself) but the problem persists. My guess is that either Tomcat or Lucene were caching something.

tarzanek commented 8 years ago

So this is definitely tomcat But it's a bit weird so this must come from json itself. Most probably json servlet doesn't set proper headers. This needs to be fixed - can you try to set correct headers in similar way as for other servlets? See developer howto on how to get source and build webapp in netbeans ... or json servlet has an eclipse subproject too ...

samsongli commented 8 years ago

Thanks, I am actually running a local build - modified JSON servlet to fit my needs. Which header should I change? When I stop OpenGrok webapp, shouldn't JSON servlet be stopped (and cache got cleaned) as well? Or JSON servlet is treated as a separated webapp?

tarzanek commented 8 years ago

Yes json servlet should be stopped but then I am not sure on tomcat cache. Also double verify on the open lucene readers from the json. This could also be a view on the index ehich could still show deleted documents ... and the json needs to refresh his viewers ...

rstrlcpy commented 5 years ago

Any update? I use v1.0.

I removed directories from data/xref/ data/index/ data/historycache/ and restarted tomcat, but still see content.

Also I tried clearIndex. No results.

vladak commented 5 years ago

What Tomcat version is this ? Does this happen with 1.1.x ?

rstrlcpy commented 5 years ago

Tomcat 8.5.35. I didn't try 1.1.x.

vladak commented 5 years ago

I can replicate this on Tomcat running on localhost with OpenGrok 1.2.1. The "trouble" is that the data for xref file/history/annotate views is regenerated on the fly. The search still works because usually Lucene indexes are memory mapped. Even using the system/refresh RESTful API endpoint does not make difference - it fails with Lucene exception so most likely the index files remain mapped (pmap reports the index files still present with (deleted) suffix. This is just normal behavior, at least on Unix systems).

Maybe we should return back to the very start and figure out what was the use case.