Open timja opened 8 years ago
I guess this actually led to a range of failing (old/existing) tests of PRs (e.g. for https://github.com/jenkinsci/email-ext-plugin/pull/170 for email-ext plugin currently maintained by davidvanlaatum) due to running into timeout:
Error Message test timed out after 180 seconds Stacktraceorg.junit.runners.model.TestTimedOutException: test timed out after 180 seconds at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:267) at jenkins.InitReactorRunner.run(InitReactorRunner.java:45) at jenkins.model.Jenkins.executeReactor(Jenkins.java:1009) at jenkins.model.Jenkins.(Jenkins.java:877) at hudson.model.Hudson. (Hudson.java:85) at org.jvnet.hudson.test.JenkinsRule.newHudson(JenkinsRule.java:617) at org.jvnet.hudson.test.JenkinsRule.before(JenkinsRule.java:390) at hudson.plugins.emailext.plugins.content.ScriptContentTest$1.before(ScriptContentTest.java:62) at org.jvnet.hudson.test.JenkinsRule$1.evaluate(JenkinsRule.java:543) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Standard Output === Starting testWhenTemplateNotFoundThrowFileNotFoundException(hudson.plugins.emailext.plugins.content.ScriptContentTest) Standard Error 0.009 [id=1157] INFO o.jvnet.hudson.test.JenkinsRule#createWebServer: Running on http://localhost:40893/jenkins/ 0.047 [id=1163] INFO jenkins.InitReactorRunner$1#onAttained: Started initialization 0.049 [id=1175] INFO jenkins.InitReactorRunner$1#onAttained: Listed all plugins 0.087 [id=1163] WARNING o.j.h.test.TestExtensionLoader#isActive: ignoring class org.jenkinsci.plugins.workflow.support.actions.LogActionImplTest$ChattyStep$DescriptorImpl compiled against jenkins-test-harness older than 2.16 0.088 [id=1163] WARNING o.j.h.test.TestExtensionLoader#isActive: ignoring class org.jenkinsci.plugins.workflow.steps.AbstractStepImplRestartTest$BlockStep$DescriptorImpl compiled against jenkins-test-harness older than 2.16 0.120 [id=1163] INFO jenkins.InitReactorRunner$1#onAttained: Prepared all plugins 0.120 [id=1176] INFO jenkins.InitReactorRunner$1#onAttained: Started all plugins 0.121 [id=1168] INFO jenkins.InitReactorRunner$1#onAttained: Augmented all extensions 0.292 [id=1169] INFO jenkins.InitReactorRunner$1#onAttained: Loaded all jobs 180.002 [id=1] WARNING o.j.hudson.test.JenkinsRule$2#evaluate: Test timed out (after 180 seconds). ... "SSHD.init" Id=1172 Group=FailOnTimeoutGroup RUNNABLE (in native) at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:255) at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:424) at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:441) - locked java.lang.Object@6deee370 at sun.security.provider.NativePRNG$RandomIO.access$500(NativePRNG.java:331) at sun.security.provider.NativePRNG.engineGenerateSeed(NativePRNG.java:226) at java.security.SecureRandom.generateSeed(SecureRandom.java:533) at org.apache.sshd.common.random.BouncyCastleRandom. (BouncyCastleRandom.java:57) at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:48) at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:41) at org.apache.sshd.common.random.SingletonRandomFactory. (SingletonRandomFactory.java:37) at org.apache.sshd.SshServer.setUpDefaultServer(SshServer.java:452) at org.jenkinsci.main.modules.sshd.SSHD.start(SSHD.java:83) - locked org.jenkinsci.main.modules.sshd.SSHD@49c17ba4 at org.jenkinsci.main.modules.sshd.SSHD.init(SSHD.java:146) at sun.reflect.GeneratedMethodAccessor284.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104) at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175) at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282) at jenkins.model.Jenkins$7.runTask(Jenkins.java:998) at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:210) at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Number of locked synchronizers = 1 - java.util.concurrent.ThreadPoolExecutor$Worker@5414b9f
SSHD's BouncyCastleRandom class only uses the SecureRandom instance to generate an 8-byte seed using SecureRandom#generateSeed for a pure-Java PRNG from BouncyCastle, and the constructed BouncyCastleRandom instance is used as a singleton in SSHD, so that call should only block once. The most likely explanation is that it is using /dev/random, which is blocking occasionally when running tests.
We could change the SecureRandom instance to use the NativePRNGNonBlocking provider, which will use /dev/urandom for engineGenerateSeed.
Notably, it looks like SHA1PRNG's generateSeed and nextBytes methods use either /dev/random or /dev/urandom as their source based on the values of java.security.egd and securerandom.source, which we don't want to modify, so SHA1PRNG seems like a bad choice here. We could still use SHA1PRNG's engineNextBytes method, since the seed that it uses to initialize the state is computed statically, and so may already be ready by the time we want to use if if someone else initialized it, but we can't guarantee that it won't block
Since NativePRNGNonBlocking always uses /dev/urandom, and unlike in JENKINS-20108 we don't care about SHA1PRNG's increased throughput, I think switching to NativePRNGNonBlocking is the best option.
[Originally related to: JENKINS-20108]
Similar to the well-known problem in ssh-slaves but affecting the Jenkins SSH server. I noticed that workflow-plugin functional tests were running very slowly with low CPU usage, and saw in thread dumps
The bundled server should be switched to not rely on NativePRNG, which can be quite slow.
sudo apt-get install haveged seems to have improved test speed in my case, but I should not have to do this.
Originally reported by jglick, imported from: SSHD relying on NativePRNG