Closed schuyler closed 5 years ago
Note: I've tested this change by installing a dev package on Vagrant and confirming that the option appears in the JVM command-line:
$ ps auwx | grep java
tomcat7 1744 62.0 62.2 1762996 313828 ? Sl 22:18 1:03 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Djava.util.logging.config.file=/var/lib/tomcat7/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.awt.headless=true -Xmx256m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -Djava.security.egd=file:/dev/./urandom -Djava.endorsed.dirs=/usr/share/tomcat7/endorsed -classpath /usr/share/tomcat7/bin/bootstrap.jar:/usr/share/tomcat7/bin/tomcat-juli.jar -Dcatalina.base=/var/lib/tomcat7 -Dcatalina.home=/usr/share/tomcat7 -Djava.io.tmpdir=/tmp/tomcat7-tomcat7-tmp org.apache.catalina.startup.Bootstrap start
In theory this ought to solve the issue. We will have to keep an eye out and see if the slow startup times persist.
If this works, it's a much better solution than the symlink.
Do we know that this command-line option causes the change we want in the OpenMRS server startup process? For example, could you run a startup without this option, replicate the delay, then re-run it with this option and compare the log output?
I was able to repro the PRNG starvation issue by rebooting the NUC after a fresh install and configure, confirming that Debian had replaced /dev/random
again, and then running while true; do time dd if=/dev/random of=/dev/null bs=1G count=1; done
in one tty, while restarting the server and watching catalina.out
in another.
I was able to produce Creation of SecureRandom instance ... took [367,325] milliseconds
in the log. This was the bulk of the >6m start time for the Tomcat server. Watching the times on dd
, I could confirm that large reads from /dev/random
were slowing into the dozens of seconds.
With the shell loop still eating /dev/random
I added the CATALINA_OPTS
line from this PR, and stopped and started tomcat cold. From this point the Creation of SecureRandom instance
line stops appearing in catalina.out
, and start time returned to the range of 54-55000ms.
I restarted tomcat cold several times to confirm, with consistent server startup times. I rebooted the NUC, and was able again to confirm 55s start times for Tomcat.
Thanks for pushing me to confirm this result. I believe it is verified.
Fantastic! Thanks for the thorough testing.
We've experienced slow server startup times in the past, due to a fairly well-known issue with Catalina blocking while looking for randomness.
In 1562e92, we tackled this issue by replacing
/dev/random
with/dev/urandom
; however, this approach has two disadvantages: One, the PRNG devices are replaced by Debian on boot; two, it addresses a JVM "bug" by making a system-wide change with possible unintended consequences.This PR addresses the problem a little more directly by explicitly telling the JVM which PRNG to use when starting Catalina. The use of
/dev/./urandom
(sic) apparently sidesteps some cleverness internal to the JVM that we evidently don't want.