K8s Jenkins slave pod error "SEVERE: http://jenkins:8080/ provided port:50000 is not reachable"

Describe the bug Jenkins master pod deployed successfully. But when I trigger Jenkins job and jenkins slave pod gets created, jnlp container errors out "port:50000 is not reachable". This is probably due to Jenkins Kubernetes plugin config, which can also be set from values.yaml for agent.* and controller.agent* configs (https://github.com/jenkinsci/helm-charts/blob/main/charts/jenkins/README.md#to-300) I assume.

Version of Helm and Kubernetes:

Helm Version:

$ helm version
version.BuildInfo{Version:"v3.4.2", GitCommit:"23dd3af5e19a02d4f4baa5b2f242645a1a3af629", GitTreeState:"dirty", GoVersion:"go1.15.5"}```

Kubernetes Version:

```console
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-12T01:08:32Z", GoVersion:"go1.15.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Which version of the chart: Chart version is 3.1.2.

What happened:

Jenkins helm chart deployed to AWS EKS K8s worker nodes.

Jenkins master and slave used to work until I needed to re-deploy Jenkins pod after the underlying EC2 needed to be restarted to fix vulnerabilities of linux packages.

Install Jenkins helm chart with below overrides.yaml

helm install jenkins jenkins-3.1.2.tgz     -n jenkins     -f overrides.yaml

controller:
  # use Docker in Docker jenkins, so that jenkins container can build docker image inside
  # image: mesosphere/jenkins-dind # https://hub.docker.com/r/mesosphere/jenkins-dind
  # tag: 0.9.0
  statefulSetLabels:
    app: jenkins  # needed for istio
    version: 2.0.0  # needed for istio
  serviceLabels:
    app: jenkins  # needed for istio
    version: 2.0.0  # needed for istio
  podLabels:
    app: jenkins  # needed for istio
    version: 2.0.0  # needed for istio
  additionalPlugins: # WARNING: uncommenting out these will cause pod to crash due to "cp -r not specified". So for now, these plugins need be installed manually
    - matrix-auth:2.6.4
    # - kubernetes:1.25.7
    # - workflow-job:2.39
    # - workflow-aggregator:2.6
    # - credentials-binding:1.23
    # - git:4.2.2
    # - configuration-as-code:1.41
    # - bitbucket:.1.1.11 # https://plugins.jenkins.io/bitbucket/
    # - bitbucket-build-status-notifier:1.4.2 # https://plugins.jenkins.io/bitbucket-build-status-notifier/
    # - bitbucket-oauth:0.10
    # - docker-build-publish:1.554.2  # https://plugins.jenkins.io/docker-build-publish/
    # - amazon-ecr:1.6 # https://plugins.jenkins.io/amazon-ecr/
    # - slack:2.40 # https://plugins.jenkins.io/slack/
    # - blueocean:1.23.2 # https://plugins.jenkins.io/blueocean/
    # - disk-usage:0.28 # https://plugins.jenkins.io/disk-usage/
    # - ws-cleanup:0.38 # https://plugins.jenkins.io/ws-cleanup/
    # - timestamper:1.11.3 # https://plugins.jenkins.io/timestamper/
    # - build-timeout:1.20 # https://plugins.jenkins.io/build-timeout/
  JCasC:
    defaultConfig: false
  agentListenerPort: 50000
  agentListenerHostPort:
  agentListenerNodePort:
  disabledAgentProtocols:
    - JNLP-connect
    - JNLP2-connect
  # Kubernetes service type for the JNLP agent service
  # agentListenerServiceType is the Kubernetes Service type for the JNLP agent service,
  # either 'LoadBalancer', 'NodePort', or 'ClusterIP'
  # Note if you set this to 'LoadBalancer', you *must* define annotations to secure it. By default
  # this will be an external load balancer and allowing inbound 0.0.0.0/0, a HUGE
  # security risk:  https://github.com/kubernetes/charts/issues/1341
  agentListenerServiceType: "ClusterIP"
serviceAccount:
  name: jenkins
  # for Jenkins pod to assume IAM role (IRSA)
  annotations: 
    eks.amazonaws.com/role-arn: "arn:aws:iam::xxxx:role/EKSJenkinsRole"

persistence:
  existingClaim: jenkins-claim # efs csi driver doesn't support dynamic provisioning, so pv and pvc needs to be precreated. Ref: https://github.com/kubernetes-sigs/aws-efs-csi-driver
  # storageClass: efs # use EFS storageclass. If the storage class is set to null or left undefined (persistence.storageClass=), the default provisioner is used (gp2 on AWS, standard on GKE, AWS & OpenStack).
  size: 8Gi

agent:
  enabled: true
  defaultsProviderTemplate: ""
  # URL for connecting to the Jenkins contoller
  jenkinsUrl:
  # connect to the specified host and port, instead of connecting directly to the Jenkins controller
  jenkinsTunnel:
  kubernetesConnectTimeout: 5
  kubernetesReadTimeout: 15
  maxRequestsPerHostStr: "32"
  namespace: jenkins
  image: "jenkins/inbound-agent"
  tag: "4.6-1"
  workingDir: "/home/jenkins"
  customJenkinsLabels: []
  # name of the secret to be used for image pulling
  imagePullSecretName:
  componentName: "jenkins-agent"
  websocket: false
  privileged: false
  runAsUser:
  runAsGroup:
  resources:
    requests:
      cpu: "512m"
      memory: "512Mi"
    limits:
      cpu: "512m"
      memory: "512Mi"
  # You may want to change this to true while testing a new image
  alwaysPullImage: false
  # Controls how agent pods are retained after the Jenkins build completes
  # Possible values: Always, Never, OnFailure
  podRetention: "Never"
  # You can define the volumes that you want to mount for this container
  # Allowed types are: ConfigMap, EmptyDir, HostPath, Nfs, PVC, Secret
  # Configure the attributes as they appear in the corresponding Java class for that type
  # https://github.com/jenkinsci/kubernetes-plugin/tree/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/volumes
  volumes: []
  # - type: ConfigMap
  #   configMapName: myconfigmap
  #   mountPath: /var/myapp/myconfigmap
  # - type: EmptyDir
  #   mountPath: /var/myapp/myemptydir
  #   memory: false
  # - type: HostPath
  #   hostPath: /var/lib/containers
  #   mountPath: /var/myapp/myhostpath
  # - type: Nfs
  #   mountPath: /var/myapp/mynfs
  #   readOnly: false
  #   serverAddress: "192.0.2.0"
  #   serverPath: /var/lib/containers
  # - type: PVC
  #   claimName: mypvc
  #   mountPath: /var/myapp/mypvc
  #   readOnly: false
  # - type: Secret
  #   defaultMode: "600"
  #   mountPath: /var/myapp/mysecret
  #   secretName: mysecret
  # Pod-wide environment, these vars are visible to any container in the agent pod

  # You can define the workspaceVolume that you want to mount for this container
  # Allowed types are: DynamicPVC, EmptyDir, HostPath, Nfs, PVC
  # Configure the attributes as they appear in the corresponding Java class for that type
  # https://github.com/jenkinsci/kubernetes-plugin/tree/master/src/main/java/org/csanchez/jenkins/plugins/kubernetes/volumes/workspace
  workspaceVolume: {}
  # - type: DynamicPVC
  #   configMapName: myconfigmap
  # - type: EmptyDir
  #   memory: false
  # - type: HostPath
  #   hostPath: /var/lib/containers
  # - type: Nfs
  #   readOnly: false
  #   serverAddress: "192.0.2.0"
  #   serverPath: /var/lib/containers
  # - type: PVC
  #   claimName: mypvc
  #   readOnly: false
  # Pod-wide environment, these vars are visible to any container in the agent pod
  envVars: []
  # - name: PATH
  #   value: /usr/local/bin
  nodeSelector: {}
  # Key Value selectors. Ex:
  # jenkins-agent: v1

  # Executed command when side container gets started
  command:
  args: "${computer.jnlpmac} ${computer.name}"
  # Side container name
  sideContainerName: "jnlp"
  # Doesn't allocate pseudo TTY by default
  TTYEnabled: false
  # Max number of spawned agent
  containerCap: 10
  # Pod name
  podName: "default"
  # Allows the Pod to remain active for reuse until the configured number of
  # minutes has passed since the last step was executed on it.
  idleMinutes: 0
  # Raw yaml template for the Pod. For example this allows usage of toleration for agent pods.
  # https://github.com/jenkinsci/kubernetes-plugin#using-yaml-to-define-pod-templates
  # https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  yamlTemplate: ""
  # yamlTemplate: |-
  #   apiVersion: v1
  #   kind: Pod
  #   spec:
  #     tolerations:
  #     - key: "key"
  #       operator: "Equal"
  #       value: "value"
  # Defines how the raw yaml field gets merged with yaml definitions from inherited pod templates: merge or override
  yamlMergeStrategy: "override"
  # Timeout in seconds for an agent to be online
  connectTimeout: 100
  # Annotations to apply to the pod.
  annotations: {}

  # Below is the implementation of custom pod templates for the default configured kubernetes cloud.
  # Add a key under podTemplates for each pod template. Each key (prior to | character) is just a label, and can be any value.
  # Keys are only used to give the pod template a meaningful name.  The only restriction is they may only contain RFC 1123 \ DNS label
  # characters: lowercase letters, numbers, and hyphens. Each pod template can contain multiple containers.
  # For this pod templates configuration to be loaded the following values must be set:
  # controller.JCasC.defaultConfig: true
  # Best reference is https://<jenkins_url>/configuration-as-code/reference#Cloud-kubernetes. The example below creates a python pod template.
  podTemplates: {}
  #  python: |
  #    - name: python
  #      label: jenkins-python
  #      serviceAccount: jenkins
  #      containers:
  #        - name: python
  #          image: python:3
  #          command: "/bin/sh -c"
  #          args: "cat"
  #          ttyEnabled: true
  #          privileged: true
  #          resourceRequestCpu: "400m"
  #          resourceRequestMemory: "512Mi"
  #          resourceLimitCpu: "1"
  #          resourceLimitMemory: "1024Mi"

Followed the kubernetes plugin doc to setup Cloud config: https://github.com/jenkinsci/kubernetes-plugin

As in the screenshot, connection to Jenkins is successful using "Test Connection" button as Jenkins pod is within AWS EKS cluster.

When I trigger Jenkins job, slave pod terminates.

Here are logs:

$ k logs -n jenkins -c jnlp -f xxx-master-25-z0h57-2hfpd-7632l 
Mar 18, 2021 8:29:30 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: xxx-master-25-z0h57-2hfpd-7632l
Mar 18, 2021 8:29:30 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Mar 18, 2021 8:29:30 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.3
Mar 18, 2021 8:29:30 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Mar 18, 2021 8:29:30 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Mar 18, 2021 8:29:30 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins:8080/]
Mar 18, 2021 8:29:30 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Mar 18, 2021 8:29:35 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver isPortVisible
WARNING: connect timed out
Mar 18, 2021 8:29:35 PM hudson.remoting.jnlp.Main$CuiListener error
SEVERE: http://jenkins:8080/ provided port:50000 is not reachable
java.io.IOException: http://jenkins:8080/ provided port:50000 is not reachable
 at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:314)
 at hudson.remoting.Engine.innerRun(Engine.java:693)
 at hudson.remoting.Engine.run(Engine.java:518)

Verified the endpoint /tcpSlaveAgentListener from a curl pod in jenkins namespace

k apply -f ../../tests/pod_curl.yaml 

k exec -it curl -n jenkins sh 
 / $ curl jenkins:8080/tcpSlaveAgentListener/ -v *   Trying 172.20.35.230:8080... * Connected to jenkins (172.20.35.230) port 8080 (#0) > GET /tcpSlaveAgentListener/ HTTP/1.1 > Host: jenkins:8080 > User-Agent: curl/7.75.0-DEV > Accept: */* >  * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK   # <----- works! < Date: Thu, 18 Mar 2021 19:49:34 GMT < X-Content-Type-Options: nosniff < Content-Type: text/plain;charset=utf-8 < X-Hudson-JNLP-Port: 50000 < X-Jenkins-JNLP-Port: 50000 < X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAplLpc8tR8VSYXA9MFqeJT7UQl8RjGhN9rnbhZJiK+RRkDIs9IsOX0vsdP6WuZkUHr49DxZYpuZOJcTDYoctzTr+jOS5JB7pGE6zpJI7YsrcS0f5S/Umlssdj5vYf6D3oHj1X/afrchvhWCJRRG94JIjxYjN0Cac5P8whd8Q2QoNPEncTY9MfDet8yn1PxXd0uq2LH8LbwOsDszsWOpxw2ACekpniauCWyw20B1WiAoj9l4DplyugvWCZQqCzl9ls0N7xe7FXZctMxP3IBZhh/zhoUbcS8y4tNP6fLNkLAVWMFyqYa6GVww7RpyGgnll9RCvQTR2K+cXzWBITop29pwIDAQAB < X-Jenkins-Agent-Protocols: JNLP4-connect, Ping < X-Remoting-Minimum-Version: 3.14 < Content-Length: 12 < Server: Jetty(9.4.33.v20201020) <  

   Jenkins * Connection #0 to host jenkins left intact

However, the private endpoint (with AWS VPN) /tcpSlaveAgentListener used to work but it doesn't now, not sure if this is related to the error "provided port:50000 is not reachable"

# used to work
$ curl http://internal-xxxx-xxxx.us-east-1.elb.amazonaws.com/tcpSlaveAgentListener/ -v 
 *   Trying 10.1.xx.xx... * TCP_NODELAY set * Connected to internal-xxxx-xxxx.us-east-1.elb.amazonaws.com (10.1.xx.xx) port 80 (#0) > GET /tcpSlaveAgentListener/ HTTP/1.1 > Host: internal-xxxx-xxxx.us-east-1.elb.amazonaws.com > User-Agent: curl/7.54.0 > Accept: */* >  < HTTP/1.1 200 OK < date: Fri, 12 Jun 2020 11:50:37 GMT < x-content-type-options: nosniff < content-type: text/plain;charset=utf-8 < x-hudson-jnlp-port: 50000 < x-jenkins-jnlp-port: 50000 < x-instance-identity: MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAuSNmwO+JEpFTaJvuIb5o8+gr311aFqAfRV8Hh97mJHZmGBqG7kGJf74tc6hr5cREVRD+vw8giqaUzyvALu4GomUVJFpo0PzCXaRjphRIjkdhis7oZ8utdtCl9CdNGr9yXVZq4hp+znCm3Rg9XNlJ1u8pWLGihk4vz+2phkXBQ0rOCk203L8KuQ8CeEgbSvSQHwtyiSUixAVO1AVZ0uWBNqBdzwKu6GuaAqAU1lUErJrxKk+NVqZJ5KiOAMnbVbsEwAou3ySIBZPeSsALsez/y2BKJfJD8gdvqRmVp6GNsYXU56IbsM9s8WyAmVwP85h52Svl8sSr3UsbNEOcZsy5VwIDAQAB < x-jenkins-agent-protocols: JNLP4-connect, Ping < x-remoting-minimum-version: 3.14 < content-length: 12 < server: istio-envoy < x-envoy-upstream-service-time: 2 <  
   Jenkins

# right now doesn't work
curl http://internal-xxxx-xxxx.us-east-1.elb.amazonaws.com/tcpSlaveAgentListener/ -v
*   Trying 10.1.xx.xx...
* TCP_NODELAY set
* Connected to internal-xxxx-xxxx.us-east-1.elb.amazonaws.com (10.1.xx.xx) port 80 (#0)
> GET /tcpSlaveAgentListener/ HTTP/1.1
> Host: internal-xxxx-xxxx.us-east-1.elb.amazonaws.com
> User-Agent: curl/7.54.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< date: Thu, 18 Mar 2021 20:50:58 GMT
< server: istio-envoy
< Content-Length: 0
< Connection: keep-alive
< 
* Connection #0 to host internal-xxx-xxxx.us-east-1.elb.amazonaws.com left intact

I've tried setting JENKINS_URL=http://jenkins:8080, to no avail.

When I set JENKINS_TUNNEL=jenkins:50000, then jenkins slave pod hangs

$ k logs -n jenkins -c jnlp -f xxx-master-24-ltvqp-48lxv-q122c 
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main createEngine
INFO: Setting up agent: xxxx-24-ltvqp-48lxv-q122c
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener <init>
INFO: Jenkins agent is running in headless mode.
Mar 18, 2021 8:28:40 PM hudson.remoting.Engine startEngine
INFO: Using Remoting version: 4.3
Mar 18, 2021 8:28:40 PM org.jenkinsci.remoting.engine.WorkDirManager initializeWorkDir
INFO: Using /home/jenkins/agent/remoting as a remoting work directory
Mar 18, 2021 8:28:40 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
INFO: Both error and output logs will be printed to /home/jenkins/agent/remoting
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Locating server among [http://jenkins:8080/]
Mar 18, 2021 8:28:40 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting server accepts the following protocols: [JNLP4-connect, Ping]
Mar 18, 2021 8:28:40 PM org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver resolve
INFO: Remoting TCP connection tunneling is enabled. Skipping the TCP Agent Listener Port availability check
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Agent discovery successful
 Agent address: jenkins
 Agent port: 50000
 Identity: fc:7f:01:98:49:4a:b5:ac:51:bd:73:6c:f7:b3:08:71
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Mar 18, 2021 8:28:40 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connecting to jenkins:50000 # <------ hangs here for 2 mins and eventually pod terminates

I've looked through and tried these:

https://stackoverflow.com/questions/44180595/tcpslaveagentlistener-not-found-on-jenkins-server - https://stackoverflow.com/questions/58719522/tcpslaveagentlistener-is-invalid-404-not-found https://github.com/jenkinsci/docker/issues/788 https://programmer.ink/think/installing-jenkins-on-k8s-and-common-problems.html - https://issues.jenkins.io/browse/JENKINS-63832

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know:

jenkinsci / helm-charts

K8s Jenkins slave pod error "SEVERE: http://jenkins:8080/ provided port:50000 is not reachable" #298