jenkinsci / lucene-search-plugin

Jenkins plugin for searching job data via Lucene or Solr
https://plugins.jenkins.io/lucene-search
MIT License
5 stars 12 forks source link

use local reader and close them after use #16

Closed hmarkc closed 3 years ago

hmarkc commented 3 years ago

updateReader() is not safe to use as it keeps opening new IndexReader without closing the old ones. It leads to "Too many files open" exception or "not enough unfragmented virtual memory address" exception depending on which directory the IndexReader is using. This issue is not likely to be found with small sample size but it becomes a real headache if one is working with a large data set.

private void updateReader() throws IOException {
    dbWriter.commit();
    reader = DirectoryReader.open(index);
}

I was working on a large set of data and after indexing for 9 hours, there will be OOM error as there are too many open files. I did a small test and use "lsof | wc -l" and find out that the number of files keep increasing after each rebuilding. However, I tried to close the reader in updateReader function call but it would have concurrency issues. If this method is synchronized, then a potential "out of heap space" exception might be thrown because the updateReader calls will pile up and consume all heap space. My current solution is use a local reader for each getHits() call. Although it is expected to slow down the search process, I do not feel much speed difference by inspection.

If any better concurrency solution is found, it will be welcome.