Intercept with --service option not working when multiple service matching selector

lorenzo-milicia commented 1 month ago

Describe the bug Trying to run telepresence intercept on a deployment with multiple services that match the selector doesn't work, even when using the --service option.

I'm trying to run the intercept against a deployment with two different services matching the selectors. As expected, if I don't specify which service I want to intercept, I receive the error:

Deployment <name> has multiple interceptable services with port <port>.
Please specify the service you want to intercept by passing the --service=<svc> flag.

When I run the command with the added option --service <svc1> I still get:

Warning FailedCreate Error creating: admission webhook "agent-injector.getambassador.io" denied the request: found multiple services with a selector matching labels map[REDACTED] in namespace <namespace>, use --service and one of: <svc1>,<svc2>

Which is the only thing that shows up in the logs gathered with telepresence loglevel debug

Expected behavior I expect to be able to run the intercept by specifying the Service to intercept.

Versions (please complete the following information):

Output of telepresence version OSS Client : v2.18.0 OSS Root Daemon: v2.18.0 OSS User Daemon: v2.18.0 Traffic Manager: v2.18.0 Traffic Agent : not reported by traffic-manager
Operating system of workstation running telepresence commands Both Windows 11 and Fedora under WSL
Kubernetes environment and Version [e.g. Minikube, bare metal, Google Kubernetes Engine] Kubernetes hosted on Azure

thallgren commented 1 month ago

@lorenzo-milicia I'm trying to set something up to replicate the problem, and would appreciate if you could answer these questions:

Do the services use the same targetPort?
If the answer is no, does it work if you specify the port?
Are the services using symbolic or numeric target ports?

lorenzo-milicia commented 1 month ago

@thallgren The two services point to the same port, but one uses the named taggedPort, while the other uses the explicit number of the port (8080)

lorenzo-milicia commented 1 month ago

I tried matching the targetPorts, both using the numeric and the symbolic options, but I still get the same error

thallgren commented 1 month ago

I've tried every conceivable combination of two services that uses the same deployment and the same port. I'm not able to reproduce this. Would you care sharing the manifests for your service and deployment (with sensitive details redacted of course, I don't need to know what image you're running, etc.).

lorenzo-milicia commented 1 month ago

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: 
  labels:
    app: backend
    app.kubernetes.io/managed-by: Helm
  annotations:
    AppVersion: 0.16072.1586813-5298-6d1b19c2
    ...
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        AppVersion: 0.16072.1586813-5298-6d1b19c2
        app: backend
      annotations:
      ...
    spec:
      initContainers:
      ...
      containers:
        - name: backend
          image: 
          ports:
            - name: backend-tcp
              containerPort: 8080
              protocol: TCP
            - name: jdwp
              containerPort: 5005
              protocol: TCP
          env:
          resources:
          livenessProbe:
          readinessProbe:
          startupProbe:
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      schedulerName: default-scheduler

Service 1:

apiVersion: v1
kind: Service
metadata:
  name: backend
  namespace: 
  labels:
    app: backend
    app.kubernetes.io/managed-by: Helm
    metrics: exposed
  annotations:
    meta.helm.sh/release-name: 
    meta.helm.sh/release-namespace: 
spec:
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: backend-tcp
  selector:
    app: backend
  type: ClusterIP

Service 2:

apiVersion: v1
kind: Service
metadata:
  name: backend-2
  namespace: 
  labels:
    app: backend
    app.kubernetes.io/managed-by: Helm
  annotations:
    meta.helm.sh/release-name: 
    meta.helm.sh/release-namespace: 
spec:
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: 8080
  selector:
    app: backend
    AppVersion: 0.16072.1586813-5298-6d1b19c2
  type: ClusterIP

thallgren commented 1 month ago

I'm still not able to reproduce. I apply the manifest below, based on your services and deployment in namespace "lorenzo". Everything works the way that is expected when I do the following:

➜  telepresence connect -n lorenzo                      
Connected to context kind-dev, namespace lorenzo (https://127.0.0.1:46713)
➜  telepresence list
backend: ready to intercept (traffic-agent not yet installed)
➜  telepresence intercept backend
telepresence intercept: error: connector.CreateIntercept: Deployment backend.lorenzo has multiple interceptable service ports.
Please specify the service and/or service port you want to intercept by passing the --service=<svc> and/or --port=<local:svcPortName> flag.
➜  telepresence intercept backend --service backend
Using Deployment backend
   Intercept name         : backend
   State                  : ACTIVE
   Workload kind          : Deployment
   Destination            : 127.0.0.1:8080
   Service Port Identifier: http
   Volume Mount Point     : /tmp/telfs-843133341
   Intercepting           : all TCP connections

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: 
  labels:
    app: backend
  annotations:
    AppVersion: 0.16072.1586813-5298-6d1b19c2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        AppVersion: 0.16072.1586813-5298-6d1b19c2
        app: backend
    spec:
      containers:
        - name: backend
          image: docker.io/thhal/echo-server:latest
          ports:
            - name: backend-tcp
              containerPort: 8080
              protocol: TCP
            - name: jdwp
              containerPort: 5005
              protocol: TCP
          env:
            - name: PORTS
              value: "8080,8081"
          resources:
---
apiVersion: v1
kind: Service
metadata:
  name: backend
  namespace:
  labels:
    app: backend
    app.kubernetes.io/managed-by: Helm
spec:
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: backend-tcp
  selector:
    app: backend
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
  name: backend-2
  namespace:
  labels:
    app: backend
spec:
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: 8080
  selector:
    app: backend
    AppVersion: 0.16072.1586813-5298-6d1b19c2
  type: ClusterIP

thallgren commented 1 month ago

Can you check if you have an entry for this deployment in telepresence-agents config-map, and if so, please show what it looks like?

lorenzo-milicia commented 1 month ago

The config map telepresence-agents does not contain any data at all. The only difference I can see with your test, is that I have the traffic manager installed in the same namespace, so I need to use the --manager-namespace option as well.

lorenzo-milicia commented 1 month ago

By the way, we never had this issue with telepresence 2.5.8, it showed up only now that I tried to upgrade to 2.18.0

thallgren commented 1 month ago

You are using an enterprise traffic-manager version 2.18.0. You'll need the OSS manager 2.18.0 (it corresponds to enterprise 2.19.x).

Please note that the Helm chart that we publish is for our enterprise version. When using OSS, you'll need to install with telepresence helm install.

The output for telepresence version on my machine is:

➜  telepresence version
OSS Client         : v2.18.0
OSS Root Daemon    : v2.18.0
OSS User Daemon    : v2.18.0
OSS Traffic Manager: v2.18.0
Traffic Agent      : docker.io/datawire/tel2:2.18.0

but your version output lacks the OSS prefix of the "Traffic Manager" and also prints this:

Traffic Agent : not reported by traffic-manager

Which in turn means that you don't have the fix provided in #3436.

lorenzo-milicia commented 1 month ago

Wow, what a catch! I'll try to install the OSS traffic manager and update you. Does this mean that the traffic manager OSS cannot be installed through helm in any way?

thallgren commented 1 month ago

I believe that you can create the chart as a tgz using commands in the makefile, and then use that. But the recommended way is to use the telepresence helm command. It uses Helm (embedded) and an embedded chart that corresponds to the binary.

lorenzo-milicia commented 1 month ago

Unfortunately I still get the same error! The output of telepresence version is:

OSS Client         : v2.18.0
OSS Root Daemon    : v2.18.0
OSS User Daemon    : v2.18.0
OSS Traffic Manager: v2.18.0
Traffic Agent      : docker.io/datawire/tel2:2.18.0

But I still get:

telepresence intercept: error: connector.CreateIntercept: Error creating: admission webhook "agent-inje
ctor.getambassador.io" denied the request: found multiple services with a selector matching labels map[
AppVersion:0.16072.1586813-5298-6d1b19c2 app:backend pod-template-hash:5f6899d865 telepresence.io/workl
oadEnabled:true telepresence.io/workloadName:backend] in namespace <namespace>,
use --service and one of: backend,backend2
Hint: if the error mentions resource quota, the traffic-agent's requested resources can be configured b
y providing values to telepresence helm install

telepresenceio / telepresence

Intercept with --service option not working when multiple service matching selector #3602