jenkinsci / helm-charts

Jenkins helm charts
https://artifacthub.io/packages/helm/jenkinsci/jenkins
Apache License 2.0
562 stars 890 forks source link

K8s Jenkins slave pod error "SEVERE: http://jenkins:8080/ provided port:50000 is not reachable" #298

Closed hasakura12 closed 3 years ago

hasakura12 commented 3 years ago

Describe the bug Jenkins master pod deployed successfully. But when I trigger Jenkins job and jenkins slave pod gets created, jnlp container errors out "port:50000 is not reachable". This is probably due to Jenkins Kubernetes plugin config, which can also be set from values.yaml for agent.* and controller.agent* configs (https://github.com/jenkinsci/helm-charts/blob/main/charts/jenkins/README.md#to-300) I assume.

Version of Helm and Kubernetes:

Helm Version:

$ helm version
version.BuildInfo{Version:"v3.4.2", GitCommit:"23dd3af5e19a02d4f4baa5b2f242645a1a3af629", GitTreeState:"dirty", GoVersion:"go1.15.5"}```

Kubernetes Version:

```console
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-12T01:08:32Z", GoVersion:"go1.15.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Which version of the chart: Chart version is 3.1.2.

What happened:

Jenkins helm chart deployed to AWS EKS K8s worker nodes.

Jenkins master and slave used to work until I needed to re-deploy Jenkins pod after the underlying EC2 needed to be restarted to fix vulnerabilities of linux packages.

Install Jenkins helm chart with below overrides.yaml

helm install jenkins jenkins-3.1.2.tgz     -n jenkins     -f overrides.yaml
controller:
  # use Docker in Docker jenkins, so that jenkins container can build docker image inside
  # image: mesosphere/jenkins-dind # https://hub.docker.com/r/mesosphere/jenkins-dind
  # tag: 0.9.0
  statefulSetLabels:
    app: jenkins  # needed for istio
    version: 2.0.0  # needed for istio
  serviceLabels:
    app: jenkins  # needed for istio
    version: 2.0.0  # needed for istio
  podLabels:
    app: jenkins  # needed for istio
    version: 2.0.0  # needed for istio
  additionalPlugins: # WARNING: uncommenting out these will cause pod to crash due to "cp -r not specified". So for now, these plugins need be installed manually
    - matrix-auth:2.6.4
    # - kubernetes:1.25.7
    # - workflow-job:2.39
    # - workflow-aggregator:2.6
    # - credentials-binding:1.23
    # - git:4.2.2
    # - configuration-as-code:1.41
    # - bitbucket:.1.1.11 # https://plugins.jenkins.io/bitbucket/
    # - bitbucket-build-status-notifier:1.4.2 # https://plugins.jenkins.io/bitbucket-build-status-notifier/
    # - bitbucket-oauth:0.10
    # - docker-build-publish:1.554.2  # https://plugins.jenkins.io/docker-build-publish/
    # - amazon-ecr:1.6 # https://plugins.jenkins.io/amazon-ecr/
    # - slack:2.40 # https://plugins.jenkins.io/slack/
    # - blueocean:1.23.2 # https://plugins.jenkins.io/blueocean/
    # - disk-usage:0.28 # https://plugins.jenkins.io/disk-usage/
    # - ws-cleanup:0.38 # https://plugins.jenkins.io/ws-cleanup/
    # - timestamper:1.11.3 # https://plugins.jenkins.io/timestamper/
    # - build-timeout:1.20 # https://plugins.jenkins.io/build-timeout/
  JCasC:
    defaultConfig: false
  agentListenerPort: 50000
  agentListenerHostPort:
  agentListenerNodePort:
  disabledAgentProtocols:
    - JNLP-connect
    - JNLP2-connect
  # Kubernetes service type for the JNLP agent service
  # agentListenerServiceType is the Kubernetes Service type for the JNLP agent service,
  # either 'LoadBalancer', 'NodePort', or 'ClusterIP'
  # Note if you set this to 'LoadBalancer', you *must* define annotations to secure it. By default
  # this will be an external load balancer and allowing inbound 0.0.0.0/0, a HUGE
  # security risk:  https://github.com/kubernetes/charts/issues/1341
  agentListenerServiceType: "ClusterIP"
serviceAccount:
  name: jenkins
  # for Jenkins pod to assume IAM role (IRSA)
  annotations: 
    eks.amazonaws.com/role-arn: "arn:aws:iam::xxxx:role/EKSJenkinsRole"

persistence:
  existingClaim: jenkins-claim # efs csi driver doesn't support dynamic provisioning, so pv and pvc needs to be precreated. Ref: https://github.com/kubernetes-sigs/aws-efs-csi-driver
  # storageClass: efs # use EFS storageclass. If the storage class is set to null or left undefined (persistence.storageClass=), the default provisioner is used (gp2 on AWS, standard on GKE, AWS & OpenStack).
  size: 8Gi

agent:
  enabled: true
  defaultsProviderTemplate: ""
  # URL for connecting to the Jenkins contoller
  jenkinsUrl:
  # connect to the specified host and port, instead of connecting directly to the Jenkins controller
  jenkinsTunnel:
  kubernetesConnectTimeout: 5
  kubernetesReadTimeout: 15
  maxRequestsPerHostStr: "32"
  namespace: jenkins
  image: "jenkins/inbound-agent"
  tag: "4.6-1"
  workingDir: "/home/jenkins"
  customJenkinsLabels: []
  # name of the secret to be used for image pulling
  imagePullSecretName:
  componentName: "jenkins-agent"
  websocket: false
  privileged: false
  runAsUser:
  runAsGroup:
  resources:
    requests:
      cpu: "512m"
      memory: "512Mi"
    limits:
      cpu: "512m"
      memory: "512Mi"
  # You may want to change this to true while testing a new image
  alwaysPullImage: false
  # Controls how agent pods are retained after the Jenkins build completes
  # Possible values: Always, Never, OnFailure
  podRetention: "Never"
  # You can define the volumes that you want to mount for this container
  # Allowed types are: ConfigMap, EmptyDir, HostPath, Nfs, PVC, Secret
  # Configure the attributes as they appear in the corresponding Java class for that type
  # https://github.com/jenkinsci/kubernetes-plugin/tree/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/volumes
  volumes: []
  # - type: ConfigMap
  #   configMapName: myconfigmap
  #   mountPath: /var/myapp/myconfigmap
  # - type: EmptyDir
  #   mountPath: /var/myapp/myemptydir
  #   memory: false
  # - type: HostPath
  #   hostPath: /var/lib/containers
  #   mountPath: /var/myapp/myhostpath
  # - type: Nfs
  #   mountPath: /var/myapp/mynfs
  #   readOnly: false
  #   serverAddress: "192.0.2.0"
  #   serverPath: /var/lib/containers
  # - type: PVC
  #   claimName: mypvc
  #   mountPath: /var/myapp/mypvc
  #   readOnly: false
  # - type: Secret
  #   defaultMode: "600"
  #   mountPath: /var/myapp/mysecret
  #   secretName: mysecret
  # Pod-wide environment, these vars are visible to any container in the agent pod

  # You can define the workspaceVolume that you want to mount for this container
  # Allowed types are: DynamicPVC, EmptyDir, HostPath, Nfs, PVC
  # Configure the attributes as they appear in the corresponding Java class for that type
  # https://github.com/jenkinsci/kubernetes-plugin/tree/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/volumes/workspace
  workspaceVolume: {}
  # - type: DynamicPVC
  #   configMapName: myconfigmap
  # - type: EmptyDir
  #   memory: false
  # - type: HostPath
  #   hostPath: /var/lib/containers
  # - type: Nfs
  #   readOnly: false
  #   serverAddress: "192.0.2.0"
  #   serverPath: /var/lib/containers
  # - type: PVC
  #   claimName: mypvc
  #   readOnly: false
  # Pod-wide environment, these vars are visible to any container in the agent pod
  envVars: []
  # - name: PATH
  #   value: /usr/local/bin
  nodeSelector: {}
  # Key Value selectors. Ex:
  # jenkins-agent: v1

  # Executed command when side container gets started
  command:
  args: "${computer.jnlpmac} ${computer.name}"
  # Side container name
  sideContainerName: "jnlp"
  # Doesn't allocate pseudo TTY by default
  TTYEnabled: false
  # Max number of spawned agent
  containerCap: 10
  # Pod name
  podName: "default"
  # Allows the Pod to remain active for reuse until the configured number of
  # minutes has passed since the last step was executed on it.
  idleMinutes: 0
  # Raw yaml template for the Pod. For example this allows usage of toleration for agent pods.
  # https://github.com/jenkinsci/kubernetes-plugin#using-yaml-to-define-pod-templates
  # https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  yamlTemplate: ""
  # yamlTemplate: |-
  #   apiVersion: v1
  #   kind: Pod
  #   spec:
  #     tolerations:
  #     - key: "key"
  #       operator: "Equal"
  #       value: "value"
  # Defines how the raw yaml field gets merged with yaml definitions from inherited pod templates: merge or override
  yamlMergeStrategy: "override"
  # Timeout in seconds for an agent to be online
  connectTimeout: 100
  # Annotations to apply to the pod.
  annotations: {}

  # Below is the implementation of custom pod templates for the default configured kubernetes cloud.
  # Add a key under podTemplates for each pod template. Each key (prior to | character) is just a label, and can be any value.
  # Keys are only used to give the pod template a meaningful name.  The only restriction is they may only contain RFC 1123 \ DNS label
  # characters: lowercase letters, numbers, and hyphens. Each pod template can contain multiple containers.
  # For this pod templates configuration to be loaded the following values must be set:
  # controller.JCasC.defaultConfig: true
  # Best reference is https://<jenkins_url>/configuration-as-code/reference#Cloud-kubernetes. The example below creates a python pod template.
  podTemplates: {}
  #  python: |
  #    - name: python
  #      label: jenkins-python
  #      serviceAccount: jenkins
  #      containers:
  #        - name: python
  #          image: python:3
  #          command: "/bin/sh -c"
  #          args: "cat"
  #          ttyEnabled: true
  #          privileged: true
  #          resourceRequestCpu: "400m"
  #          resourceRequestMemory: "512Mi"
  #          resourceLimitCpu: "1"
  #          resourceLimitMemory: "1024Mi"

Followed the kubernetes plugin doc to setup Cloud config: https://github.com/jenkinsci/kubernetes-plugin

Screen Shot 2021-03-19 at 3 36 23 AM Screen Shot 2021-03-20 at 12 45 29 AM Screen Shot 2021-03-20 at 12 45 01 AM

As in the screenshot, connection to Jenkins is successful using "Test Connection" button as Jenkins pod is within AWS EKS cluster.

 

When I trigger Jenkins job, slave pod terminates.

 

Here are logs:

$ k logs -n jenkins -c jnlp -f xxx-master-25-z0h57-2hfpd-7632l 
Mar 18, 2021 8:29:30 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: xxx-master-25-z0h57-2hfpd-7632l
Mar 18, 2021 8:29:30 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Mar 18, 2021 8:29:30 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.3
Mar 18, 2021 8:29:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Mar 18, 2021 8:29:30 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Mar 18, 2021 8:29:30 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins:8080/]
Mar 18, 2021 8:29:30 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Mar 18, 2021 8:29:35 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver isPortVisible
WARNING: connect timed out
Mar 18, 2021 8:29:35 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: http://jenkins:8080/ provided port:50000 is not reachable
java.io.IOException: http://jenkins:8080/ provided port:50000 is not reachable
 at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:314)
 at hudson.remoting.Engine.innerRun(Engine.java:693)
 at hudson.remoting.Engine.run(Engine.java:518)

  Verified the endpoint /tcpSlaveAgentListener from a curl pod in jenkins namespace

k apply -f ../../tests/pod_curl.yaml 

k exec -it curl -n jenkins sh 
 / $ curl jenkins:8080/tcpSlaveAgentListener/ -v *   Trying 172.20.35.230:8080... * Connected to jenkins (172.20.35.230) port 8080 (#0) > GET /tcpSlaveAgentListener/ HTTP/1.1 > Host: jenkins:8080 > User-Agent: curl/7.75.0-DEV > Accept: */* >  * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK   # <----- works! < Date: Thu, 18 Mar 2021 19:49:34 GMT < X-Content-Type-Options: nosniff < Content-Type: text/plain;charset=utf-8 < X-Hudson-JNLP-Port: 50000 < X-Jenkins-JNLP-Port: 50000 < X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAplLpc8tR8VSYXA9MFqeJT7UQl8RjGhN9rnbhZJiK+RRkDIs9IsOX0vsdP6WuZkUHr49DxZYpuZOJcTDYoctzTr+jOS5JB7pGE6zpJI7YsrcS0f5S/Umlssdj5vYf6D3oHj1X/afrchvhWCJRRG94JIjxYjN0Cac5P8whd8Q2QoNPEncTY9MfDet8yn1PxXd0uq2LH8LbwOsDszsWOpxw2ACekpniauCWyw20B1WiAoj9l4DplyugvWCZQqCzl9ls0N7xe7FXZctMxP3IBZhh/zhoUbcS8y4tNP6fLNkLAVWMFyqYa6GVww7RpyGgnll9RCvQTR2K+cXzWBITop29pwIDAQAB < X-Jenkins-Agent-Protocols: JNLP4-connect, Ping < X-Remoting-Minimum-Version: 3.14 < Content-Length: 12 < Server: Jetty(9.4.33.v20201020) <  

   Jenkins * Connection #0 to host jenkins left intact

However, the private endpoint (with AWS VPN) /tcpSlaveAgentListener used to work but it doesn't now, not sure if this is related to the error "provided port:50000 is not reachable"  

# used to work
$ curl http://internal-xxxx-xxxx.us-east-1.elb.amazonaws.com/tcpSlaveAgentListener/ -v 
 *   Trying 10.1.xx.xx... * TCP_NODELAY set * Connected to internal-xxxx-xxxx.us-east-1.elb.amazonaws.com (10.1.xx.xx) port 80 (#0) > GET /tcpSlaveAgentListener/ HTTP/1.1 > Host: internal-xxxx-xxxx.us-east-1.elb.amazonaws.com > User-Agent: curl/7.54.0 > Accept: */* >  < HTTP/1.1 200 OK < date: Fri, 12 Jun 2020 11:50:37 GMT < x-content-type-options: nosniff < content-type: text/plain;charset=utf-8 < x-hudson-jnlp-port: 50000 < x-jenkins-jnlp-port: 50000 < x-instance-identity: MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuSNmwO+JEpFTaJvuIb5o8+gr311aFqAfRV8Hh97mJHZmGBqG7kGJf74tc6hr5cREVRD+vw8giqaUzyvALu4GomUVJFpo0PzCXaRjphRIjkdhis7oZ8utdtCl9CdNGr9yXVZq4hp+znCm3Rg9XNlJ1u8pWLGihk4vz+2phkXBQ0rOCk203L8KuQ8CeEgbSvSQHwtyiSUixAVO1AVZ0uWBNqBdzwKu6GuaAqAU1lUErJrxKk+NVqZJ5KiOAMnbVbsEwAou3ySIBZPeSsALsez/y2BKJfJD8gdvqRmVp6GNsYXU56IbsM9s8WyAmVwP85h52Svl8sSr3UsbNEOcZsy5VwIDAQAB < x-jenkins-agent-protocols: JNLP4-connect, Ping < x-remoting-minimum-version: 3.14 < content-length: 12 < server: istio-envoy < x-envoy-upstream-service-time: 2 <  
   Jenkins

# right now doesn't work
curl http://internal-xxxx-xxxx.us-east-1.elb.amazonaws.com/tcpSlaveAgentListener/ -v
*   Trying 10.1.xx.xx...
* TCP_NODELAY set
* Connected to internal-xxxx-xxxx.us-east-1.elb.amazonaws.com (10.1.xx.xx) port 80 (#0)
> GET /tcpSlaveAgentListener/ HTTP/1.1
> Host: internal-xxxx-xxxx.us-east-1.elb.amazonaws.com
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< date: Thu, 18 Mar 2021 20:50:58 GMT
< server: istio-envoy
< Content-Length: 0
< Connection: keep-alive
< 
* Connection #0 to host internal-xxx-xxxx.us-east-1.elb.amazonaws.com left intact

 

I've tried setting JENKINS_URL=http://jenkins:8080, to no avail.

When I set JENKINS_TUNNEL=jenkins:50000, then jenkins slave pod hangs 

$ k logs -n jenkins -c jnlp -f xxx-master-24-ltvqp-48lxv-q122c 
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: xxxx-24-ltvqp-48lxv-q122c
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Mar 18, 2021 8:28:40 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.3
Mar 18, 2021 8:28:40 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Mar 18, 2021 8:28:40 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins:8080/]
Mar 18, 2021 8:28:40 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Mar 18, 2021 8:28:40 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
 Agent address: jenkins
 Agent port: 50000
 Identity: fc:7f:01:98:49:4a:b5:ac:51:bd:73:6c:f7:b3:08:71
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins:50000 # <------ hangs here for 2 mins and eventually pod terminates

I've looked through and tried these:

https://stackoverflow.com/questions/44180595/tcpslaveagentlistener-not-found-on-jenkins-serverhttps://stackoverflow.com/questions/58719522/tcpslaveagentlistener-is-invalid-404-not-found https://github.com/jenkinsci/docker/issues/788 https://programmer.ink/think/installing-jenkins-on-k8s-and-common-problems.htmlhttps://issues.jenkins.io/browse/JENKINS-63832

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

josiahhaswell commented 3 years ago

Have you tried using the Kubernetes internal service DNS names? Jenkins URL should be http://jenkins.<namespace>.svc.cluster.local:8080 and the tunnel should be jenkins.<namespace>.svc.cluster.local:50000

Example working configuration:

image

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue is being automatically closed due to inactivity.