telepresenceio / telepresence

Local development against a remote Kubernetes or OpenShift cluster
https://www.telepresence.io
Other
6.6k stars 520 forks source link

Command `telepresence helm install` not obeying value overrides #3642

Closed thecooldrop closed 4 months ago

thecooldrop commented 4 months ago

Describe the bug When installing Telepresence Helm chart using the Telepresence CLI, with telepresence helm install command the overrides for Helm values provided by --set and -f flags do not seem to be obeyed.

I have had issues with securityContext when trying to debug ArgoCD, because ArgoCD securityContext is overly restricted and the sidecar container fails to start as root user. I tried installing the Telepresence with telepresence helm install --set agent.securityContext={} and with telepresence helm install --set agent.securityContext=null, but in every case the values are not obeyed, and the Deployment for traffic-manager does not include the environment variables for overriding the agent security context.

To reproduce the bug you can use following values file:

values.yaml

agent:
  securityContext: 
    allowPrivilegeEscalation: true
    runAsNonRoot: false

When we then run telepresence helm install -f values.yaml then Telepresence gets installed with following Deployment in namespace ambassador:

kubectl describe deployments -n ambassador traffic-manager

Name:                   traffic-manager
Namespace:              ambassador
CreationTimestamp:      Mon, 08 Jul 2024 18:35:52 +0200
Labels:                 app=traffic-manager
app.kubernetes.io/created-by=Helm
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/version=2.19.0
helm.sh/chart=telepresence-2.19.0
telepresence=manager
Annotations:            deployment.kubernetes.io/revision: 1
meta.helm.sh/release-name: traffic-manager
meta.helm.sh/release-namespace: ambassador
Selector:               app=traffic-manager,telepresence=manager
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
Labels:           app=traffic-manager
telepresence=manager
Service Account:  traffic-manager
Containers:
traffic-manager:
Image:       docker.io/datawire/tel2:2.19.0
Ports:       8081/TCP, 443/TCP
Host Ports:  0/TCP, 0/TCP
Environment:
LOG_LEVEL:                    info
REGISTRY:                     docker.io/datawire
SERVER_PORT:                  8081
POD_CIDR_STRATEGY:            auto
MUTATOR_WEBHOOK_PORT:         443
AGENT_INJECTOR_SECRET:        mutator-webhook-tls
GRPC_MAX_RECEIVE_SIZE:        4Mi
AGENT_ARRIVAL_TIMEOUT:        30s
AGENT_INJECT_POLICY:          OnDemand
AGENT_INJECTOR_NAME:          agent-injector
AGENT_PORT:                   9900
AGENT_APP_PROTO_STRATEGY:     http2Probe
AGENT_IMAGE_PULL_POLICY:      IfNotPresent
PROMETHEUS_PORT:              0
MANAGER_NAMESPACE:             (v1:metadata.namespace)
POD_IP:                        (v1:status.podIP)
CLIENT_CONNECTION_TTL:        24h
CLIENT_DNS_EXCLUDE_SUFFIXES:  .com .io .net .org .ru
Mounts:                         <none>
Volumes:                          <none>
Conditions:
Type           Status  Reason
----           ------  ------
Available      True    MinimumReplicasAvailable
Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   traffic-manager-f658f7fc (1/1 replicas created)
Events:
Type    Reason             Age   From                   Message
----    ------             ----  ----                   -------
Normal  ScalingReplicaSet  86s   deployment-controller  Scaled up replica set traffic-manager-f658f7fc to 1

Additionally, if you are using Telepresence 2.4.4 and above, please use telepresence loglevel debug to ensure we have the most helpful logs, reproduce the error, and then run telepresence gather-logs to create a zip file of all logs for Telepresence's components (root and user daemons, traffic-manager, and traffic-agents) and attach it to this issue. See an example command below:

telepresence loglevel debug

* reproduce the error *

telepresence gather-logs --output-file /tmp/telepresence_logs.zip

# To see all options, run the following command
telepresence gather-logs --help

Here you can see that environment variable AGENT_SECURITY_CONTEXT is not configured for the Deployment, as it should be according to the current version of Helm chart.

Side-information about Telepresence:

telepresence version

OSS Client     : v2.19.0
OSS Root Daemon: v2.19.0
OSS User Daemon: v2.19.0
Traffic Manager: not connected

kubectl version -o yaml

clientVersion:
buildDate: "2022-05-24T12:26:19Z"
compiler: gc
gitCommit: 3ddd0f45aa91e2f30c70734b175631bec5b5825a
gitTreeState: clean
gitVersion: v1.24.1
goVersion: go1.18.2
major: "1"
minor: "24"
platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
buildDate: "2024-02-14T22:24:00Z"
compiler: gc
gitCommit: 4b8e819355d791d96b7e9d9efe4cbafae2311c88
gitTreeState: clean
gitVersion: v1.29.2
goVersion: go1.21.7
major: "1"
minor: "29"
platform: linux/amd64

kind version:

kind v0.22.0 go1.20.13 linux/amd64
thallgren commented 4 months ago

I believe this was fixed by #3628, but its not yet released.