Closed JustSong closed 5 years ago
After some test, it's maybe can add more projects after we add some argument in <tomcat8>/bin/catalina.sh
file, but I think this is just workaround solution...
Almost 16 project can be deployed:
CATALINA_OPTS="-Xms8g -Xmx12g -Xincgc -XX:MaxPermSize=256m"
Almost 12 projects can be deployed:
CATALINA_OPTS="-Xms4g -Xmx8g -Xincgc -XX:MaxPermSize=256m"
Almost 5 projects can be deployed:
CATALINA_OPTS="-Xms2g -Xmx2g -Xincgc -XX:MaxPermSize=256m"
How big are the sizes of your index data and especially the suggester data ? (per project)
I remember when we upgraded from 1.0 to 1.1 internally, the memory requirements grew enough so that OutOfMemoryError
was hit. Dissecting the memory dump with an analyzer I figured out it was the suggester that prompted the memory increase and it was in fact legitimate.
This needs to be documented better on the wiki.
In general, this is the magic of doing capacity planning for Java application. Either by trial and error or by careful measurements+computation.
https://github.com/oracle/opengrok/wiki/Tuning-for-large-code-bases#web-application has a section that explains how to take the suggester data into account.
For reference, here's some data of a internal hiccup which happened when transitioning to 1.1. Some time after 1.1. was deployed the application server (Tomcat in our case) started crashing with OOM exception.
In top(1) (this is from Solaris machine) it looked like so:
PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND
992 webservd 112 10 0 418G 18G cpu/18 461.4H 59.66% java
by doing some tracing it was figured out that the process was evidently busy with GC chasing last remaining bits of Java heap.
Thus -XX:+HeapDumpOnOutOfMemoryError
was added to the Java arguments of the app server and we waited for the next OOM to get a dump.
Analyzing the dump with MAT (ironically the Java heap of MAT had to be increased as well so it can actually complete the analysis) it was found that the Suggester was eating some 4GiB+ of data, more than half of the heap:
Interestingly, only small portion of the projects contributed heavily to this size (https://en.wikipedia.org/wiki/Pareto_principle in action). From 300+ projects we had only 20 or so summed up to the 4 gigs. As @ahornace wrote, the suggester data for Linux kernel takes some 20MiB (5x10^6 terms).
The biggest Suggester footprint was ~300 MiB for a project, then it was quickly trailing:
# cd /opengrok/data/suggester
# ls -1 | while read proj; do echo -n $proj " "; gfind $proj -type f -name '*.wfst' -printf '%s\n' | awk '{ sum += $0; } END { print sum; }'; done | sort +1 -n | tail -20
proj1 35485144
proj2 37001192
proj3 37173219
proj4 40763143
proj5 42073189
proj6 42073364
proj7 42187526
proj8 42334809
proj9 45023858
proj10 46612295
proj11 52972279
proj12 62475424
proj13 64643156
proj14 65392272
proj15 67582078
proj16 72876722
proj17 73073343
proj18 112418978
proj19 126162613
proj20 325016849
So, in the end I summed the sizes of all *.wfst
files under the Data Root, multiplied it by some constant and bumped the Java heap by that value. Here's a (rather telling) excerpt from Tomcat's setenv.sh
:
# OpenGrok memory boost to cover all-project searches
# (7 MB * 247 projects + 300 MB for cache should be enough)
# 64-bit Java allows for more so let's use 8GB to be on the safe side.
# We might need to allow more for concurrent all-project searches.
# However, with OpenGrok 1.1 the suggester requires more memory for
# each project (in one case the suggester footprint was 4.5 GB)
# so bump the 8 GB to 16 GB to be on the safe side.
JAVA_OPTS="$JAVA_OPTS -Xmx16g"
The moral of the story is that Java heap size needs to be set with regards to the worst case scenario (many all/multi project searches happening concurrently) and has ample space for growth of the indexed data (adding more projects / deployed webapps).
Hi @vladak ,
Thanks a lot for your detail information. I think this can be written in OpenGrok document so that we can know about this.
Here is our OpenGrok project structure, we set up each project as a different URL location:
source1 => https://opengrok.studio.com/source1
source2 => https://opengrok.studio.com/source2
source3 => https://opengrok.studio.com/source3
...
source8 => https://opengrok.studio.com/source8
Our project's data size (25 GB * 24 Projects almost use 8 GB Memory):
|--- Source Code (25GB)
|--- index (6GB)
|--- xref (4.7 GB)
|--- historycache (3.5 GB)
|--- suggester (1.4 GB)
I have already set the argument -Xms12g -Xmx12g -Xincgc
in <tomcat8>/bin/catalina.sh
file and problem were resolved.
Thanks again for your kindly help!
You're welcome. It's reasonably well documented in the wikis now I think.
Here is our OpenGrok project structure, we set up each project as a different URL location:
We found in this version of Opengrok, the system will get high memory usage situation ... If we still increase new project (.war file) in
<tomcat8>/webapps
directory, the Tomcat will deploy error :<tomcat8>/logs/catalina.out
error msg:Memory Status:
Is there any setting I need to notice?