rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.47k stars 216 forks source link

cannot clone ssh url "SSH agent requested but SSH_AUTH_SOCK not-specified" #2495

Closed joshuacox closed 3 weeks ago

joshuacox commented 3 weeks ago

Is there an existing issue for this?

Current Behavior

when cloning using the ssh identity I setup like this

#!/bin/sh
kubectl create \
  secret generic ssh-key \
  -n fleet-default \
  --from-file=ssh-privatekey=aldrin.key \
  --type=kubernetes.io/ssh-auth
kubectl create \
  secret generic ssh-key \
  -n fleet-local \
  --from-file=ssh-privatekey=aldrin.key \
  --type=kubernetes.io/ssh-auth

I did both fleet-local and fleet-default after one didn't work.

for this repo:

git@github.com:rancher/fleet-examples

clones successfully:

git clone git@github.com:rancher/fleet-examples
Cloning into 'fleet-examples'...
remote: Enumerating objects: 946, done.
remote: Counting objects: 100% (221/221), done.
remote: Compressing objects: 100% (112/112), done.
remote: Total 946 (delta 139), reused 126 (delta 109), pack-reused 725
Receiving objects: 100% (946/946), 1.78 MiB | 7.81 MiB/s, done.
Resolving deltas: 100% (460/460), done.

However, cloning using this yaml:

apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: sample
  # This namespace is special and auto-wired to deploy to the local cluster
  namespace: fleet-local
spec:
  # Everything from this repo will be run in this cluster. You trust me right?
  repo: "git@github.com:rancher/fleet-examples"
  paths:
  - simple

fails in the logs with

$ kubectl logs -f $gitjob-pod-name -n cattle-fleet-system:

level=error msg="Error fetching latest commit: error creating SSH agent: \"SSH agent requested but SSH_AUTH_SOCK not-specified\""

Expected Behavior

successful clone and deployment

Steps To Reproduce

Once fleet is running on your cluster (I used rancher to set it up for me), simply kubectl apply -f example-ssh.yml as given above.

Environment

- Architecture: x86_64
- Fleet Version: 2.8.4
- Cluster: k3s
  - Provider: on-prem
  - Options: nothing special
  - Kubernetes Version: v1.28.10+k3s1

Logs

full of stuff like this:

time="2024-06-08T02:48:17Z" level=debug msg="Enqueueing gitjob fleet-local/sample in 15 seconds"
time="2024-06-08T02:48:27Z" level=error msg="Error fetching latest commit: error creating SSH agent: \"SSH agent requested but SSH_AUTH_SOCK not-specified\""

Anything else?

Notice my gitrepo yaml is merely the default example given in the docs with the repo url changed to git+ssh style. Just in case anyone was wondering if using an ssh:// url works in stead, I tried the below as well:

apiVersion: fleet.cattle.io/v1alpha1
kind: GitRepo
metadata:
  name: sample
  # This namespace is special and auto-wired to deploy to the local cluster
  namespace: fleet-local
spec:
  # Everything from this repo will be run in this cluster. You trust me right?
  repo: "ssh://git@github.com/rancher/fleet-examples"
  paths:
  - simple

with identical log output:

time="2024-06-08T02:56:26Z" level=error msg="Error fetching latest commit: error creating SSH agent: \"SSH agent requested but SSH_AUTH_SOCK not-specified\""
time="2024-06-08T02:56:26Z" level=debug msg="Enqueueing gitjob fleet-local/sample in 15 seconds"
time="2024-06-08T02:56:26Z" level=error msg="Error fetching latest commit: error creating SSH agent: \"SSH agent requested but SSH_AUTH_SOCK not-specified\""
time="2024-06-08T02:56:26Z" level=debug msg="Enqueueing gitjob fleet-local/sample in 15 seconds"
Tortoaster commented 3 weeks ago

Ran into the same issue. The docs of the GitRepo Resource mention a clientSecretName field that should contain the name of the secret containing your ssh key, but setting it didn't solve the problem for me.

Note that the docs also mention that the default branch being tracked is master, which is accurate for fleet-examples, but is something to keep in mind for other repositories.

joshuacox commented 3 weeks ago

@Tortoaster tyvm for the link to that particular page. That option solves my issue. I am closing, I can re-open if think you might get better attention for your issue @Tortoaster but you are probably better off opening a new one.