oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.33k stars 746 forks source link

Growing number of suggester files in Tomcat temp directory #2857

Open anishk25 opened 5 years ago

anishk25 commented 5 years ago

The tomcat temp directory for my opengrok server (/opt/flatiron/mounts/var/lib/tomcat8/temp/) is filling up with opengrok_suggester_wfst_input_XX.tmp and opengrok_suggester_wfst_sort_XX.tmp files. I have been tracking the number of files of this type being created by the suggester and also the amount of space being taken up. Both of them have been steadily growing. Is there a way to configure opengrok so that these files get automatically cleaned up ?

vladak commented 5 years ago

There was similar issue filed however I cannot find it right now.

vladak commented 5 years ago

Definitely, the temporary files should be cleaned up automatically.

tulinkry commented 5 years ago

It is still the same issue even if you can't find it

anishk25 commented 5 years ago

I think you might be referring to this issue: https://github.com/oracle/opengrok/issues/2240

vladak commented 5 years ago

Yep, that's the one. What version are you running ?

Dne út 9. 7. 2019 16:58 uživatel Anish Khattar notifications@github.com napsal:

I think you might be referring to this issue: #2240 https://github.com/oracle/opengrok/issues/2240

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oracle/opengrok/issues/2857?email_source=notifications&email_token=AAWMMDENVLFYPXO7RL2DRBDP6SRPNA5CNFSM4H66ESL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZQRBOY#issuecomment-509677755, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWMMDH5ZH2XECQ7BQAZD6LP6SRPNANCNFSM4H66ESLQ .

anishk25 commented 5 years ago

I am running version 1.2.7

anishk25 commented 5 years ago

I am guessing I need to update the a later version to fix this issue.

vladak commented 5 years ago

It may be worth trying.

Dne čt 11. 7. 2019 0:13 uživatel Anish Khattar notifications@github.com napsal:

I am guessing I need to update the a later version to fix this issue.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/oracle/opengrok/issues/2857?email_source=notifications&email_token=AAWMMDE4IOE5KZUKHERH3GTP6ZNG3A5CNFSM4H66ESL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZU43SA#issuecomment-510250440, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWMMDCYP7RQNF6UAWD72B3P6ZNG3ANCNFSM4H66ESLQ .

ahornace commented 5 years ago

Those files are completely in Lucene's control.

https://github.com/apache/lucene-solr/blob/master/lucene/suggest/src/java/org/apache/lucene/search/suggest/SortedInputIterator.java#L176 https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/OfflineSorter.java#L317

The best approach would probably be to create the issue on Lucene's side.

However, we might be able to explicitly delete all the files in temp directory with opengrok_suggester_wfst prefix after the suggester builds its data structures. I don't like it but there is not much else we can do. What do you guys think?

unhipzippo commented 4 years ago

FYI, problem is still present in 1.3.6.

vladak commented 4 years ago

@ahornace as long as the files are no longer needed that could be a workaround.

vladak commented 4 years ago

Checking our production instance (runs 1.3.6, Tomcat uses -Djava.io.tmpdir=/var/tomcat8/temp) I don't see the issue. Also, the Tomcat process does not currently have any of these files opened.

unhipzippo commented 4 years ago

What version of Tomcat?

We're using 9.0.27 with AdoptOpenJDK 11.0.3+7 and with a systemd script that sets up (among other things):

Environment=CATALINA_HOME=/opt/opengrok/apache-tomcat-9.0.27
Environment=CATALINA_BASE=/var/opt/opengrok/hydra/tomcat
Environment='JAVA_OPTS=\
   -Xms1g -Xmx1g \
   -Djava.awt.headless=true \
   -Dport.shutdown=8005 \
   -Dport.ajp=8009 \
   -Dport.http=8080 \
   -Dldap.group=thwur \
   -Dcom.sun.jndi.ldap.connect.pool.timeout=1200000 \
   -Djava.util.logging.config.file=/var/opt/opengrok/hydra/tomcat/conf/logging.properties \
   -Dlog.dir=/var/opt/opengrok/hydra/log \
   -Dconfig.xml=/var/opt/opengrok/hydra/etc/configuration.xml'

ExecStart=/opt/opengrok/java/jdk/bin/java \
   -classpath ${CLASSPATH} \
   -Dcatalina.home=${CATALINA_HOME} \
   -Dcatalina.base=${CATALINA_BASE} \
   -Djava.endorsed.dirs=${JAVA_ENDORSED_DIRS} \
   -Djava.io.tmpdir=${CATALINA_BASE}/temp \
   -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager \
   $JAVA_OPTS $CATALINA_OPTS \
   org.apache.catalina.startup.Bootstrap \
   start

Suggester tempfiles seem to be getting touched around midnight local time and 3am local time (we have a regular cron indexing job that runs daily at 3am -- not sure what's touching the files at midnight)

vladak commented 4 years ago

Running Tomcat 8.5.45 with Oracle JDK 1.8.0_231.

The Suggester itself has a cron4J setup that triggers the rebuild at midnight every day by default:

public static final String REBUILD_CRON_CONFIG_DEFAULT = "0 0 * * *"; // every day at midnight
...
rebuildCronConfig = REBUILD_CRON_CONFIG_DEFAULT;
exwindchiller commented 4 years ago

We're experiencing the same issue. 300M of Tomcat suggester tmp files are added every time indexer runs. They are never deleted.

oacnhpkqxjhufwumkkqnshvlmheokqv commented 3 years ago

Seeing similar behaviour. Following for updates.

vladak commented 3 years ago

Can anyone share full indexer logs for a case where this problem happens ?

vladak commented 3 years ago

Actually, I can see this on the production instance I maintain. The last case happened on Nov 24th:

-rw-r-----   1 webservd webservd 636931654 Nov 24 21:33 opengrok_suggester_wfst_input_f.tmp
-rw-r-----   1 webservd webservd 636931654 Nov 24 21:34 opengrok_suggester_wfst_sort_g.tmp
-rw-r-----   1 webservd webservd 636927126 Nov 24 22:14 opengrok_suggester_wfst_input_8m.tmp
-rw-r-----   1 webservd webservd 636927126 Nov 24 22:16 opengrok_suggester_wfst_sort_8n.tmp

this correlates with OOM exception in the logs (I deleted the hprof generated by the OOM handling file in the mean time so I cannot look at its modified time stamp unfortunately):

24-Nov-2020 21:17:08.789 INFO [ForkJoinPool.commonPool-worker-27] org.opengrok.suggest.Suggester.rebuild Rebuilding the following suggesters: [userland-default-prepped]
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/tomcat8/logs/dumps/java_pid13536.hprof ...
Heap dump file created [7696664628 bytes in 76.132 secs]
24-Nov-2020 21:38:50.979 INFO [ForkJoinPool.commonPool-worker-27] org.opengrok.suggest.Suggester.shutdownAndAwaitTermination Suggesters for [userland-default-prepped] were successfully rebuilt (took 21 minutes 4
2 seconds)

So I guess there is some space in the Lucene code that allows for the temporary files not to be deleted. Unfortunately the Lucene APIs do not allow for explicit closing (particularly the SortedInputIterator class).

vladak commented 3 years ago

Sent a query to the Lucene user's mailing list: https://marc.info/?l=lucene-user&m=160673958030670&w=2

unhipzippo commented 3 years ago

Thanks, @vladak.

Apologies for not following up on this sooner.

I've since noted that opengrok_suggester*.tmp files are generated even if I disable the indexer entirely.

i.e. While doing a large source pull (manually, not using the OpenGrok Python scripts) for an instance containing a large amount of code (2.5+ TB), I've disabled the indexer entirely until I'm sure all of the source code I know I want is present.

Despite this, so long as the webapp is running, more and more suggester tempfiles continue to get generated. Eventually (after a couple of days), they consume all of the free space on the disk.

I'll probably end up disabling the suggester in the webapp entirely in the short term, at least until I can complete the source pull. :)

unhipzippo commented 3 years ago
root@oncs-opengrok-3:[~]: find /var/opt/opengrok/sustaining/tomcat/ -name opengrok_suggester\*tmp -print | wc -l
18
root@oncs-opengrok-3:[~]: find /var/opt/opengrok/sustaining/tomcat/ -name opengrok_suggester\*tmp -print0 | du --files0-from=- -hc | tail -n1
3.3G    total

root@oncs-opengrok-3:[~]: find /var/opt/opengrok/sustaining/tomcat/temp/ -name opengrok_suggester\*tmp -print | wc -l
20
root@oncs-opengrok-3:[~]: find /var/opt/opengrok/sustaining/tomcat/temp/ -name opengrok_suggester\*tmp -print0 | du --files0-from=- -hc | tail -n1
3.9G    total

root@oncs-opengrok-3:[~]: find /var/opt/opengrok/sustaining/tomcat/temp/ -name opengrok_suggester\*tmp -print | wc -l
29
root@oncs-opengrok-3:[~]: find /var/opt/opengrok/sustaining/tomcat/temp/ -name opengrok_suggester\*tmp -print0 | du --files0-from=- -hc | tail -n1
5.3G    total

These queries are around 5 min apart,

vladak commented 3 years ago

@unhipzippo yes, the suggester runs within the web application (for the time being - #3378). The indexer can merely trigger suggester rebuild. At first I thought this can be related to the large source pull you were doing however non-existent indexes are not considered by the suggester. Anything interesting in Tomcat logs ?

vladak commented 3 years ago

Also, does the number of temporary files created for some time frame correspond to the number of projects ? The suggester rebuild is normally done during reindex (once a project is indexed) and also on schedule (by default once a day around midnight).

unhipzippo commented 3 years ago

No explicit errors in catalina.log, but now that I look at the logs, I notice that since Nov 13, I see hundreds of thousands of warnings a day like this:

01-Dec-2020 11:21:01.277 WARNING [main] org.apache.tomcat.util.scan.StandardJarScanner.processURLs Failed to scan [file:/var/opt/opengrok/sustaining/tomcat/webapps/sustaining/WEB-INF/lib/lib/activation-1.1.1.jar] from classloader hierarchy

(Note the redundant "..../lib/lib...", which doesn't look right - not sure where that comes from)

root@oncs-opengrok-3:[~]: ls /var/opt/opengrok/sustaining/tomcat/webapps/sustaining/WEB-INF/lib/lib/activation-1.1.1.jar
ls: cannot access /var/opt/opengrok/sustaining/tomcat/webapps/sustaining/WEB-INF/lib/lib/activation-1.1.1.jar: No such file or directory

root@oncs-opengrok-3:[~]: ls /var/opt/opengrok/sustaining/tomcat/webapps/sustaining/WEB-INF/lib/activation-1.1.1.jar
ls: cannot access /var/opt/opengrok/sustaining/tomcat/webapps/sustaining/WEB-INF/lib/activation-1.1.1.jar: No such file or directory

This particular file seems to be "1.1", not the "1.1.1" being searched for:

root@oncs-opengrok-3:[~]: ls /var/opt/opengrok/sustaining/tomcat/webapps/sustaining/WEB-INF/lib/activation*
/var/opt/opengrok/sustaining/tomcat/webapps/sustaining/WEB-INF/lib/activation-1.1.jar

I'm not sure if these messages correlate to when I noticed the *.tmp files starting to exhaust the disk space, but it's a possibility.

(FYI, I was running OpenGrok 1.4.9 with AdoptOpenJDK 11.0.9+11 for the past month or two, then upgraded to OpenGrok 1.5.7 on Nov 22 -- Since the occurrence of the Tomcat warnings occurred before the upgrade, and apparently with the same frequency, I don't think the Tomcat warnings are related to the 1.4.9 > 1.5.7 upgrade)

unhipzippo commented 3 years ago

As to the number of temporary files created, I'll have to hold off on confirming that -- once I ran out of disk space, I deleted them all without logging how many there were.

Once the source pull has completed (possibly a couple of more days - ClearCase snapshot views are slow to refresh content from scratch over the network), I can re-enable the suggester and see if the tmp files start re-occurring, and pay closer attention to their frequency / number compared to the projects.

vladak commented 3 years ago

I'm not sure if these messages correlate to when I noticed the *.tmp files starting to exhaust the disk space, but it's a possibility.

I'd say these are not related, the issue is older than these log entries from what I can tell.

(FYI, I was running OpenGrok 1.4.9 with AdoptOpenJDK 11.0.9+11 for the past month or two, then upgraded to OpenGrok 1.5.7 on Nov 22 -- Since the occurrence of the Tomcat warnings occurred before the upgrade, and apparently with the same frequency, I don't think the Tomcat warnings are related to the 1.4.9 > 1.5.7 upgrade)

Interesting. I was to blame these messages on the Java 11 switch over however the cause might be different. Filed #3384 to track this.

oacnhpkqxjhufwumkkqnshvlmheokqv commented 3 years ago

This is hitting my dev instance again, the .tmp files are 364 MB - 553 MB each, there are 515 files, totalling 268 GB, dated between Jan 13 and today. Is there a specific log file I can find for you @vladak? Thanks!

OpenGrok v1.5.10

$ docker exec -i opengrok-docker /usr/local/tomcat/bin/version.sh
NOTE: Picked up JDK_JAVA_OPTIONS:  --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.rmi/sun.rmi.transport=ALL-UNNAMED
Server version: Apache Tomcat/9.0.40
Server built:   Nov 12 2020 15:35:02 UTC
Server number:  9.0.40.0
OS Name:        Linux
OS Version:     3.10.0-1160.11.1.el7.x86_64
Architecture:   amd64
JVM Version:    11.0.9.1+1
JVM Vendor:     Oracle Corporation

$ docker exec -i opengrok-docker java -version
openjdk version "11.0.9.1" 2020-11-04
OpenJDK Runtime Environment 18.9 (build 11.0.9.1+1)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.9.1+1, mixed mode)
oacnhpkqxjhufwumkkqnshvlmheokqv commented 3 years ago

I'm not sure if it's related or not, but this is showing up in the logs on that same dev instance, and on a prod instance that I just patched. Weird that it's not happening to my staging instances that are patched the same as the dev and prod instance.

WARNING [main] org.apache.tomcat.util.digester.Digester.endElement No rules found matching [Server/Service/Engine/Host/Disabled]

vladak commented 3 years ago

It would be nice if someone could come up with reproducible case using open source/public repositories.

oacnhpkqxjhufwumkkqnshvlmheokqv commented 3 years ago

@vladak How do I go about providing this?

vladak commented 3 years ago

If these are public/open repositories, just give the pointers and any relevant docker settings (e.g. environment variables).

ChristopheBordieu commented 3 years ago

Hi,

Faced same isue with OG 1.5.11 / Tomcat 9 / Java 11. No disk space so VM in a bad shape. Deleted all files.

What is the workaround? Is a cronjob to remove systematically all these suggester temporary files older than N days a good choice? What value for N knowing indexer is run nightly? And webapp eventually restarted from time to time? A wrapper to restart the webapp and get rid of the temp files could be better?

vladak commented 3 years ago

Yeah, I'd go with cron job. I'd love if anyone could come up with reproducible case.

liamwarfield commented 3 years ago

I don't know if this helps, but on my deployment, this seems to correspond to this error log:

Couldn't notify the webapp that project org.opengrok.indexer.configuration.Project@ed12966 was indexed: InboundJaxrsResponse{context=ClientResponse{method=PUT, uri=http://betaogtomcat:2424/source/api/v1/projects/a-project/indexed, status=504, reason=Gateway Time-out}}

It also appears that the webapp does get updated, The "Last Index update" continues to update. I have the indexer timeout set to 6 hrs in my readonly_configuration.

ghost commented 2 years ago

FYI The issue can be reproduced with OpenGrok 1.7.13