sabre1041 / ose-jenkins-cluster

[DEPRECATED] Jenkins slave cluster on OpenShift
17 stars 64 forks source link

Running jenkins slave via Kubernetes Plug-in throws exception #3

Closed ganrad closed 8 years ago

ganrad commented 8 years ago

When trying to use the Kubernetes Plug-in to dynamically spawn a jenkins-slave pod, the pod starts and then throws a Illegal tunneling parameter exception. Details pasted below -

[gradhakr@master ose-jenkins-cluster]$ oc get pods NAME READY STATUS RESTARTS AGE 459cef8b6896 0/1 Error 0 29s jenkins-1-afr0h 1/1 Running 0 20m jenkins-1-build 0/1 Error 0 13h jenkins-2-build 0/1 Error 0 12h jenkins-3-build 0/1 Completed 0 12h jenkins-slave-1-build 0/1 Error 0 13h jenkins-slave-2-build 0/1 Completed 0 12h [gradhakr@master ose-jenkins-cluster]$ oc logs 459cef8b6896 Running Jenkins JNLP Slave.... Mar 11, 2016 2:15:41 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: 459cef8b6896 Mar 11, 2016 2:15:41 PM hudson.remoting.jnlp.Main$CuiListener INFO: Jenkins agent is running in headless mode. Mar 11, 2016 2:15:41 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins-jenkins-ms.ose3.evanbills.com] Mar 11, 2016 2:15:42 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Mar 11, 2016 2:15:42 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: Illegal tunneling parameter: http://172.30.57.190:50000 java.io.IOException: Illegal tunneling parameter: http://172.30.57.190:50000 at hudson.remoting.Engine.connect(Engine.java:309) at hudson.remoting.Engine.run(Engine.java:242)

[gradhakr@master ose-jenkins-cluster]$ oc get svc NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE jenkins 172.30.118.72 8080/TCP name=jenkins 13h jenkins-slave 172.30.57.190 50000/TCP name=jenkins 13h [gradhakr@master ose-jenkins-cluster]$ curl http://172.30.57.190:50000

Any Idea where I am going wrong. Thanks.

sabre1041 commented 8 years ago

@ganrad in the kubernetes plugin configuration in Jenkins, does your Jenkins Tunnel address contain the protocol (http://)? The tunnel address should only contain the host similar to the following configuration

Jenkins Kubernetes Plugin Configuration

ganrad commented 8 years ago

Yes you were right. After I removed the protocol/scheme (http://) from the tunnel URL, the build job ran fine - jenkins-slave pod was spawned and the build job executed fine. But I had to specify the service IP + Port of the 'jenkins-slave' service. The service DNS (SkyDNS) co-ordinates didn't work for me (point # 2 below).

I observed a few other issues (which could be specific to my OSE instance) -

  1. After the build job completes in the jenkins slave pod, the slave pod continues to exist for a while and terminates with a status of 'Error' (Failed). Not sure if this was the intended behavior. If the Job completes Ok (returns zero status), I would expect the jenkins-slave pod to terminate with a success status (Succeeded). Running another jenkins build job spawns another 'new' slave pod in OSE (which is good).
  2. The tunnel URL 'jenkins-slave.jenkins.svc.cluster.local:50000' does not work in my environment. I take we are trying to route the request thru the built-in service DNS (SkyDNS). The URL scheme per the Kube docs is - [service name].[namespace].svc.cluster.local. In my instance, the jenkins-slave pod only starts and executes the job when I specify the actual service IP + port. Is there a way to look at the service entries stored in SkyDNS (KubeToSky CLI) ? Or a way to look at the entries stored in the embedded etcd of SkyDNS ? Any pointers here would be nice.

Thanks.

sabre1041 commented 8 years ago

@ganrad which master/slave paradigm are you implementing? Is the jenkins master running in OpenShift or external?

The following are responses your inquiries:

  1. Assuming you are using the Kubernetes plugin to dynamically provision slaves, can you double check the jenkins-slave deploymentconfig is scaled down to 0 (though it should support both static and dynamic implementations). There is a chance it could be a bug on the kubernetes plugin side?
  2. If your slave is running in the jenkins namespace (project), then SkyDNS should work. The section in the documentation on OpenShift DNS provides steps to help validate that is running in your cluster

Hope this information helps

ganrad commented 8 years ago

I am running both Jenkins Master + Slave in OpenShift (not external master).

  1. Yes, I did scale down the Jenkins-slave RC (pods = 0) before running the build job via Jenkins - the Kubernetes plug-in. I ran the build job a couple of times and got the same result - job completed Ok but the pod status was 'Error'. A new Pod was spawned to run every build job.
  2. I will go thru the OpenShift doc's tonight and trouble shoot Skydns.

Thanks.

ganrad commented 8 years ago

Here is some more detailed info.

To execute a build job, the kubernetes plug-in spawns a jenkins-slave pod, executes the job, job finishes OK, then the Pod disconnects from the master, tries to connect again & eventually the pod fails.

[gradhakr@master ~]$ oc get pods NAME READY STATUS RESTARTS AGE 1398c8488c2f9 1/1 Running 0 2m jenkins-1-afr0h 1/1 Running 0 3d jenkins-3-build 0/1 Completed 0 3d jenkins-slave-2-build 0/1 Completed 0 3d [gradhakr@master ~]$ oc logs -f 1398c8488c2f9 Running Jenkins JNLP Slave.... Mar 14, 2016 5:48:02 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: 1398c8488c2f9 Mar 14, 2016 5:48:22 PM hudson.remoting.jnlp.Main$CuiListener INFO: Jenkins agent is running in headless mode. Mar 14, 2016 5:48:23 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins-jenkins-ms.ose3.evanbills.com] Mar 14, 2016 5:48:40 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Mar 14, 2016 5:48:40 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to 172.30.57.190:50000 Mar 14, 2016 5:48:40 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP2-connect Mar 14, 2016 5:48:42 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connected Mar 14, 2016 5:50:49 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Terminated Mar 14, 2016 5:50:59 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$2$1 onReconnect INFO: Restarting slave via jenkins.slaves.restarter.UnixSlaveRestarter@561c0b36 Mar 14, 2016 5:52:52 PM hudson.remoting.jnlp.Main createEngine INFO: Setting up slave: 1398c8488c2f9 Mar 14, 2016 5:52:53 PM hudson.remoting.jnlp.Main$CuiListener INFO: Jenkins agent is running in headless mode. Mar 14, 2016 5:52:53 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Locating server among [http://jenkins-jenkins-ms.ose3.evanbills.com] Mar 14, 2016 5:53:03 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Handshaking Mar 14, 2016 5:53:03 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to 172.30.57.190:50000 Mar 14, 2016 5:53:03 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP2-connect Mar 14, 2016 5:53:03 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't understand the protocol: Unrecognized name: 1398c8488c2f9 Mar 14, 2016 5:53:03 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to 172.30.57.190:50000 Mar 14, 2016 5:53:03 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Trying protocol: JNLP-connect Mar 14, 2016 5:53:03 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Server didn't understand the protocol: No such slave: 1398c8488c2f9 Mar 14, 2016 5:53:03 PM hudson.remoting.jnlp.Main$CuiListener status INFO: Connecting to 172.30.57.190:50000 Mar 14, 2016 5:53:03 PM hudson.remoting.jnlp.Main$CuiListener error SEVERE: The server rejected the connection: None of the protocols were accepted java.lang.Exception: The server rejected the connection: None of the protocols were accepted at hudson.remoting.Engine.onConnectionRejected(Engine.java:297) at hudson.remoting.Engine.run(Engine.java:268)

[gradhakr@master ~]$ oc get pods NAME READY STATUS RESTARTS AGE 1398c8488c2f9 0/1 Error 0 7m 139b402006b7d 1/1 Running 0 4m jenkins-1-afr0h 1/1 Running 0 3d jenkins-3-build 0/1 Completed 0 3d

sabre1041 commented 8 years ago

@ganrad I discovered an issue with the system:deployer role as it did not have sufficient permission to delete pods at the conclusion of Jenkins jobs that correlates to the issue you were facing.

I have updated the documentation to specify the edit role as the name of the role that should be applied to the service accounts.

If you used the default service account, please run the following command:

oc policy add-role-to-user edit system:serviceaccount:jenkins:default

If you created the jenkins service account, please run the following command:

oc policy add-role-to-user edit system:serviceaccount:jenkins:jenkins

Would you be able to apply this change and retest?

Thanks

itewk commented 8 years ago

with this change the pod gets created, job runs, pod gets destroyed.

I am seeing this in the jenkins master logs, but not sure if it is affecting anything:

INFO: Terminating Kubernetes instance for slave 7e7fed0a08
Mar 20, 2016 2:46:29 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Terminated Kubernetes instance for slave 7e7fed0a08
Mar 20, 2016 2:46:29 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave _terminate
INFO: Disconnected computer 7e7fed0a08
Mar 20, 2016 2:46:29 PM org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback call
SEVERE: Error in provisioning; slave=KubernetesSlave name: 
, template=org.csanchez.jenkins.plugins.kubernetes.PodTemplate@65092771
java.lang.IllegalStateException: Node was deleted, computer is null
        at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:376)
        at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:311)
        at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Mar 20, 2016 2:46:29 PM hudson.remoting.AbstractByteArrayCommandTransport$1 handle
WARNING: Failed to construct Command
java.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2332)
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2801)
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
        at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:40)
        at hudson.remoting.AbstractByteArrayCommandTransport$1.handle(AbstractByteArrayCommandTransport.java:61)
        at org.jenkinsci.remoting.nio.NioChannelHub$2.run(NioChannelHub.java:594)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Mar 20, 2016 2:46:32 PM hudson.slaves.NodeProvisioner$2 run
WARNING: Provisioned slave Kubernetes Pod Template failed to launch
java.lang.IllegalStateException: Node was deleted, computer is null
        at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:376)
        at org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud$ProvisioningCallback.call(KubernetesCloud.java:311)
        at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
ganrad commented 8 years ago

@sabre1041 : I assigned the 'edit' role to 'default' service account & ran the jenkins build job - The jenkins-slave pod got spawned, the build executed+completed fine & the slave pod got deleted as well. Adding the 'edit' role to the master pod's service account solved the issue! Thanks much for your help :)

sabre1041 commented 8 years ago

@ganrad glad to hear. I am going to go ahead and close this issue

@itewk if you want to create a new issue for the exception you discovered, please feel free to do so

chris530 commented 6 years ago

Getting rid of the http:// worked for me , thx !

nickleefly commented 6 years ago

@sabre1041 I have a similar issue, Could you provide some help?

I am running jenkins master outside of kubernetes cluster, when running the job, the pod is created, I got the following error

Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior
Warning: SECRET is defined twice in command-line arguments and the environment variable
Warning: AGENT_NAME is defined twice in command-line arguments and the environment variable
Feb 13, 2018 7:02:03 AM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: jenkins-slave-n58w4-bdvl7
Feb 13, 2018 7:02:03 AM hudson.remoting.jnlp.Main$CuiListener &lt;init&gt;
INFO: Jenkins agent is running in headless mode.
Feb 13, 2018 7:02:03 AM hudson.remoting.Engine startEngine
WARNING: No Working Directory. Using the legacy JAR Cache location: /home/jenkins/.jenkins/cache/jars
Feb 13, 2018 7:02:03 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [https://MY_JENKINS_IP/]
Feb 13, 2018 7:02:04 AM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: Failed to connect to https://MY_JENKINS_IP/tcpSlaveAgentListener/: java.security.cert.CertificateException: No subject alternative names matching IP address MY_JENKINS_IP found
java.io.IOException: Failed to connect to https://MY_JENKINS_IP/tcpSlaveAgentListener/: java.security.cert.CertificateException: No subject alternative names matching IP address MY_JENKINS_IP found
    at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:199)
    at hudson.remoting.Engine.innerRun(Engine.java:518)
    at hudson.remoting.Engine.run(Engine.java:469)
Caused by: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative names matching IP address MY_JENKINS_IP found
    at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
    at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1959)
    at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
    at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
    at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514)
    at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
    at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026)

Looks jnlp agent can't talk to jenkins master. Thanks.