timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-32510] SSHD relying on NativePRNG #7783

Open timja opened 8 years ago

timja commented 8 years ago

Similar to the well-known problem in ssh-slaves but affecting the Jenkins SSH server. I noticed that workflow-plugin functional tests were running very slowly with low CPU usage, and saw in thread dumps

"Executing testDeleteSubFolder(org.jenkinsci.plugins.workflow.steps.DeleteDirStepTest)" ...
java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:502)
    at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:267)
    - locked <...> (a jenkins.model.Jenkins$7)
    at jenkins.InitReactorRunner.run(InitReactorRunner.java:44)
    at jenkins.model.Jenkins.executeReactor(Jenkins.java:915)
    at jenkins.model.Jenkins.(Jenkins.java:814)
    at hudson.model.Hudson.(Hudson.java:83)
    at org.jvnet.hudson.test.JenkinsRule.newHudson(JenkinsRule.java:559)
    at org.jvnet.hudson.test.JenkinsRule.before(JenkinsRule.java:346)
    at ...
"SSHD.init" ...
   java.lang.Thread.State: RUNNABLE
    at java.io.FileInputStream.readBytes(Native Method)
    at java.io.FileInputStream.read(FileInputStream.java:255)
    at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:410)
    at sun.security.provider.NativePRNG$RandomIO.implGenerateSeed(NativePRNG.java:427)
    - locked <...> (a java.lang.Object)
    at sun.security.provider.NativePRNG$RandomIO.access$500(NativePRNG.java:329)
    at sun.security.provider.NativePRNG.engineGenerateSeed(NativePRNG.java:224)
    at java.security.SecureRandom.generateSeed(SecureRandom.java:533)
    at org.apache.sshd.common.random.BouncyCastleRandom.(BouncyCastleRandom.java:57)
    at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:48)
    at org.apache.sshd.common.random.BouncyCastleRandom$Factory.create(BouncyCastleRandom.java:41)
    at org.apache.sshd.common.random.SingletonRandomFactory.(SingletonRandomFactory.java:37)
    at org.apache.sshd.SshServer.setUpDefaultServer(SshServer.java:452)
    at org.jenkinsci.main.modules.sshd.SSHD.start(SSHD.java:83)
    - locked <...> (a org.jenkinsci.main.modules.sshd.SSHD)
    at org.jenkinsci.main.modules.sshd.SSHD.init(SSHD.java:146)
    at sun.reflect.GeneratedMethodAccessor149.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:106)
    at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:176)
    at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282)
    at jenkins.model.Jenkins$7.runTask(Jenkins.java:904)
    at ...

The bundled server should be switched to not rely on NativePRNG, which can be quite slow.

sudo apt-get install haveged seems to have improved test speed in my case, but I should not have to do this.


Originally reported by jglick, imported from: SSHD relying on NativePRNG
  • status: Open
  • priority: Major
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 6 years ago

reinholdfuereder:

I guess this actually led to a range of failing (old/existing) tests of PRs (e.g. for https://github.com/jenkinsci/email-ext-plugin/pull/170 for email-ext plugin currently maintained by davidvanlaatum) due to running into timeout:

timja commented 6 years ago

svanoort:

dnusbaum Could I get you to take a look? Normally one wouldn't stress too much about test flakes, but this ties to a cluster of actual issues we've seen reported in the past and some of our colleagues have seen.

timja commented 6 years ago

dnusbaum:

SSHD's BouncyCastleRandom class only uses the SecureRandom instance to generate an 8-byte seed using SecureRandom#generateSeed for a pure-Java PRNG from BouncyCastle, and the constructed BouncyCastleRandom instance is used as a singleton in SSHD, so that call should only block once. The most likely explanation is that it is using /dev/random, which is blocking occasionally when running tests.

We could change the SecureRandom instance to use the NativePRNGNonBlocking provider, which will use /dev/urandom for engineGenerateSeed.

Notably, it looks like SHA1PRNG's generateSeed and nextBytes methods use either /dev/random or /dev/urandom as their source based on the values of java.security.egd and securerandom.source, which we don't want to modify, so SHA1PRNG seems like a bad choice here. We could still use SHA1PRNG's engineNextBytes method, since the seed that it uses to initialize the state is computed statically, and so may already be ready by the time we want to use if if someone else initialized it, but we can't guarantee that it won't block

Since NativePRNGNonBlocking always uses /dev/urandom, and unlike in JENKINS-20108 we don't care about SHA1PRNG's increased throughput, I think switching to NativePRNGNonBlocking is the best option.

timja commented 2 years ago

[Originally related to: JENKINS-20108]