kubernetes / git-sync

A sidecar app which clones a git repo and keeps it in sync with the upstream.
Apache License 2.0
2.14k stars 409 forks source link

git-sync sidecar container not work in bitnami spark helm chart #819

Closed jackchuong closed 9 months ago

jackchuong commented 9 months ago

Hi all, I want to create a git-sync sidecar container in spark master pod , it will sync data from my private git repo periodical (60s) using SSH. Here my chart values:

master:
  extraVolumes:
    - name: sparks-job-pv
      emptyDir: {}
    - name: ssh-key
      secret:
        defaultMode: 256
        secretName: spark-ssh-git-secret
  extraVolumeMounts:
    - name: sparks-job-pv
      mountPath: /opt/sparks-job
  sidecars:
    - name: git-sync-sparks-job
      image: k8s.gcr.io/git-sync:v3.1.5
      env:
        - name: GIT_SYNC_REPO
          value: "git@gitlab.mydomain.com:ebis1/sparks-job.git" ##repo-path-you-want-to-clone
        - name: GIT_SYNC_BRANCH
          value: "main" ##repo-branch
        - name: GIT_SYNC_SSH
          value: "true"
        - name: GIT_SYNC_ROOT
          value: /data
        - name: GIT_SYNC_DEST
          value: "sparks-job" ##path-where-you-want-to-clone
        - name: GIT_SYNC_ONE_TIME
          value: "false"
        - name: GIT_SYNC_PERIOD
          value: "60"
      securityContext:
        runAsUser: 0
      volumeMounts:
        - name: ssh-key
          mountPath: "/etc/git-secret"
        - name: sparks-job-pv
          mountPath: /data

I created a secret contain ssh key

$ kubectl -n ebis describe secret/spark-ssh-git-secret
Name:         spark-ssh-git-secret
Namespace:    ebis
Labels:       <none>
Annotations:  <none>

Type:  Opaque

Data
====
known_hosts:  674 bytes
ssh:          2622 bytes

User can use this ssh key to push/pull git repo already. But Pod Spark master not ready 1/2 and CrashLoopBackOff

$ kubectl -n ebis get pod
pod/spark-master-0                          1/2     CrashLoopBackOff   2 (18s ago)   43s

$ kubectl -n ebis describe pod spark-master-0
Name:             spark-master-0
Namespace:        ebis
Priority:         0
Service Account:  spark
Node:             k3s-dc-worker1/192.168.0.201
Start Time:       Thu, 21 Sep 2023 13:29:45 +0700
Labels:           app.kubernetes.io/component=master
                  app.kubernetes.io/instance=spark
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=spark
                  controller-revision-hash=spark-master-686777bc98
                  helm.sh/chart=spark-7.2.1
                  statefulset.kubernetes.io/pod-name=spark-master-0
Annotations:      <none>
Status:           Running
IP:               10.42.1.23
IPs:
  IP:           10.42.1.23
Controlled By:  StatefulSet/spark-master
Containers:
  spark-master:
    Container ID:   containerd://29c9d50665a8422ac8343721fabc25c328c7391c20d5595c49fc256407d84daf
    Image:          docker.io/bitnami/spark:3.4.1-debian-11-r0
    Image ID:       docker.io/bitnami/spark@sha256:b09846b988dba82090b53455b33d557ef21923b13cab54e6029d87f604898b26
    Ports:          8080/TCP, 7077/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Thu, 21 Sep 2023 13:29:46 +0700
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  4Gi
    Requests:
      cpu:      2
      memory:   4Gi
    Liveness:   http-get http://:8080/ delay=180s timeout=5s period=20s #success=1 #failure=6
    Readiness:  http-get http://:8080/ delay=30s timeout=5s period=10s #success=1 #failure=6
    Environment:
      BITNAMI_DEBUG:            false
      SPARK_MODE:               master
      SPARK_DAEMON_MEMORY:
      SPARK_MASTER_PORT:        7077
      SPARK_MASTER_WEBUI_PORT:  8080
    Mounts:
      /mnt/certs from cert (ro)
      /opt/jars from ebis-pv (rw)
      /opt/sparks-job from sparks-job-pv (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9g7d5 (ro)
  git-sync-sparks-job:
    Container ID:   containerd://08473509b094b0a4dffaed2e2266b57ae1b6dfff9e664c69104ed2269f064b48
    Image:          k8s.gcr.io/git-sync:v3.1.5
    Image ID:       k8s.gcr.io/git-sync@sha256:f38673f25b8e6a27b3518a34c304c9c3b10b9fc18b917c9e2b5f8f63c4da7cc6
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 21 Sep 2023 13:30:31 +0700
      Finished:     Thu, 21 Sep 2023 13:30:31 +0700
    Ready:          False
    Restart Count:  3
    Environment:
      GIT_SYNC_REPO:      git@gitlab.mydomain.com:ebis1/sparks-job.git
      GIT_SYNC_BRANCH:    main
      GIT_SYNC_SSH:       true
      GIT_SYNC_ROOT:      /data
      GIT_SYNC_DEST:      sparks-job
      GIT_SYNC_ONE_TIME:  false
      GIT_SYNC_PERIOD:    60
    Mounts:
      /data from sparks-job-pv (rw)
      /etc/git-secret from ssh-key (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9g7d5 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  ebis-pv:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  ebis-pvc
    ReadOnly:   false
  cert:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            secrets-store.csi.k8s.io
    FSType:
    ReadOnly:          true
    VolumeAttributes:      secretProviderClass=cert-spc
  sparks-job-pv:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  ssh-key:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  spark-ssh-git-secret
    Optional:    false
  kube-api-access-9g7d5:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  66s                default-scheduler  Successfully assigned ebis/spark-master-0 to k3s-dc-worker1
  Normal   Pulled     66s                kubelet            Container image "docker.io/bitnami/spark:3.4.1-debian-11-r0" already present on machine
  Normal   Created    66s                kubelet            Created container spark-master
  Normal   Started    66s                kubelet            Started container spark-master
  Normal   Pulled     21s (x4 over 66s)  kubelet            Container image "k8s.gcr.io/git-sync:v3.1.5" already present on machine
  Normal   Created    21s (x4 over 65s)  kubelet            Created container git-sync-sparks-job
  Normal   Started    21s (x4 over 65s)  kubelet            Started container git-sync-sparks-job
  Warning  BackOff    5s (x6 over 63s)   kubelet            Back-off restarting failed container git-sync-sparks-job in pod spark-master-0_ebis(7f2a2716-a357-4045-9bb0-3c5c1a693631)

$ kubectl -n ebis logs -f -c git-sync-sparks-job spark-master-0
INFO: detected pid 1, running init handler
I0921 06:36:51.378406      13 main.go:322]  "level"=0 "msg"="starting up"  "args"=["/git-sync"]
I0921 06:36:51.378519      13 main.go:575]  "level"=0 "msg"="cloning repo"  "origin"="git@gitlab.mydomain.com:ebis1/sparks-job.git" "path"="/data"
E0921 06:36:51.504122      13 main.go:348]  "msg"="failed to sync repo, aborting" "error"="error running command: exit status 128: \"Cloning into '/data'...\\nfatal: Could not read from remote repository.\\n\\nPlease make sure you have the correct access rights\\nand the repository exists.\\n\""

Meanwhile, this test.yaml works fine

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: ebis
  labels:
    app: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx-helloworld
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: www-data
      - name: git-sync
        image: k8s.gcr.io/git-sync:v3.1.5
        volumeMounts:
        - name: www-data
          mountPath: /data
        - name: git-secret
          mountPath: "/etc/git-secret"
        env:
        - name: GIT_SYNC_REPO
          value: "git@gitlab.mydomain.com:ebis1/sparks-job.git" ##repo-path-you-want-to-clone
        - name: GIT_SYNC_BRANCH
          value: "main" ##repo-branch
        - name: GIT_SYNC_SSH
          value: "true"
        - name: GIT_SYNC_ROOT
          value: /data
        - name: GIT_SYNC_DEST
          value:  "sparks-job" ##path-where-you-want-to-clone
        - name: GIT_SYNC_ONE_TIME
          value: "false"
        - name: GIT_SYNC_PERIOD
          value: "60"
        securityContext:
          runAsUser: 0
      volumes:
      - name: www-data
        emptyDir: {}
      - name: git-secret
        secret:
          defaultMode: 256
          secretName: spark-ssh-git-secret # your-ssh-key

container git-sync can sync data from git repo each 60s and I can see data in container nginx-helloworld at /usr/share/nginx/html , pod nginx-deployment running without error

$ kubectl -n ebis get pod
nginx-deployment-7f8ccf868b-8dvbm       2/2     Running       0          94s
thockin commented 9 months ago

First, that's a pretty old git-sync (more than 3.5 years old!) - I would encourage you to use something more recent if you care about CVEs. v3.6.9 is the latest v3 and v4.0.0 is even more modern but has some incompatible changes.

From what I understand, git-sync is working, but your Spark CRD is not setting something up properly?

I don't know anything about that, and I am afraid you are not going to find much of an answer here. You can crank up the --v level to 6 and get better logs, which MIGHT help (again, that is such an old version it will be hard to say for sure). You can at least compare git-sync logs from the success case and the fail case.

Another approach might be to run it with command set to sleep and args set to inf, and the kubectl exec into a shell to see what is going wrong - permissions or secret not mounted or who knows..

jackchuong commented 9 months ago

Hi @thockin I changed image for both spark-master-0 and nginx-deployment image: registry.k8s.io/git-sync/git-sync:v3.6.5 and this is log from git-sync container in spark-master-0 pod

kubectl -n ebis logs -f -c git-sync-sparks-job spark-master-0
INFO: detected pid 1, running init handler
I0922 02:14:53.822649      12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync"]
I0922 02:14:53.838928      12 main.go:950] "level"=0 "msg"="cloning repo" "origin"="git@gitlab.mydomain.com:ebis1/sparks-job.git" "path"="/data"
E0922 02:14:53.967181      12 main.go:547] "msg"="too many failures, aborting" "error"="Run(git clone -v --no-checkout -b main git@gitlab.mydomain.com:ebis1/sparks-job.git /data): exit status 128: { stdout: "", stderr: "Cloning into '/data'...\nFailed to add the ECDSA host key for IP address '192.168.0.11' to the list of known hosts (/etc/git-secret/known_hosts).\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\n@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @\r\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\r\nPermissions 0440 for '/etc/git-secret/ssh' are too open.\r\nIt is required that your private key files are NOT accessible by others.\r\nThis private key will be ignored.\r\nLoad key \"/etc/git-secret/ssh\": bad permissions\r\ngit@gitlab.mydomain.com: Permission denied (publickey).\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists." }" "failCount"=1

this is log from git-sync container in nginx-deployment pod

kubectl -n ebis logs -f -c git-sync nginx-deployment-5786486d45-sr6jb
INFO: detected pid 1, running init handler
I0922 02:11:15.442456      12 main.go:401] "level"=0 "msg"="starting up" "pid"=12 "args"=["/git-sync"]
I0922 02:11:15.478253      12 main.go:950] "level"=0 "msg"="cloning repo" "origin"="git@gitlab.mydomain.com:ebis1/sparks-job.git" "path"="/data"
I0922 02:11:15.807335      12 main.go:760] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="7a1b1b612486f7b6ce8a9bd98ae8abdc0fcef2b1"
I0922 02:11:15.825159      12 main.go:800] "level"=0 "msg"="adding worktree" "path"="/data/7a1b1b612486f7b6ce8a9bd98ae8abdc0fcef2b1" "branch"="origin/main"
I0922 02:11:15.830422      12 main.go:860] "level"=0 "msg"="reset worktree to hash" "path"="/data/7a1b1b612486f7b6ce8a9bd98ae8abdc0fcef2b1" "hash"="7a1b1b612486f7b6ce8a9bd98ae8abdc0fcef2b1"
I0922 02:11:15.830472      12 main.go:865] "level"=0 "msg"="updating submodules"

If I add command sleep to spark-master-0 , so it can start successfully then I can see that it can map spark-ssh-git-secret to path /etc/git-secret/ssh and /etc/git-secret/known_hosts in container git-sync

ls /etc/git-secret
known_hosts  ssh
thockin commented 9 months ago

So again, the simple deployment is working but spark is messing something up, right?

Are the permissions right? SSH is very particular about who can read/write keys and it is saying "Permissions 0440 for '/etc/git-secret/ssh' are too open".

You may also want to set the KNOWN_HOSTS flag to "false".

jackchuong commented 9 months ago

The permission is 777 at /etc/git-secret in both spark-master-0 (fail) and nginx-deployment (ok)

ls -lh
total 0
lrwxrwxrwx 1 root 1001 18 Sep 22 03:52 known_hosts -> ..data/known_hosts
lrwxrwxrwx 1 root 1001 10 Sep 22 03:52 ssh -> ..data/ssh
chmod 600 known_hosts
chmod: changing permissions of 'known_hosts': Read-only file system
chmod 600 ssh
chmod: changing permissions of 'ssh': Read-only file system

I added

- name: GIT_SYNC_SSH_KNOWN_HOSTS
  value: "false"

But issue still exist I tried exec into container git-sync-sparks-job in pod spark-master-0

kubectl -n ebis exec -it -c git-sync-sparks-job spark-master-0 -- sh
# cd /data
# ls
# git clone -v --no-checkout -b main git@gitlab.mydomain.com:ebis1/sparks-job.git
Cloning into 'sparks-job'...
The authenticity of host 'gitlab.mydomain.com (192.168.0.11)' can't be established.
ECDSA key fingerprint is SHA256:AZiBro/EmLwwohRn1ywqxl6RpyTvYdfOGpihbDSTVUc.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'gitlab.mydomain.com,192.168.0.11' (ECDSA) to the list of known hosts.
git@gitlab.mydomain.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
thockin commented 9 months ago

What permissions on the actual key file (after the symlink)?

Please run git-sync with -v6 to get more complete logs of the commands being run.

On Thu, Sep 21, 2023, 9:13 PM jackchuong @.***> wrote:

The permission is 777 at /etc/git-secret in both spark-master-0 (fail) and nginx-deployment (ok)

ls -lh total 0 lrwxrwxrwx 1 root 1001 18 Sep 22 03:52 known_hosts -> ..data/known_hosts lrwxrwxrwx 1 root 1001 10 Sep 22 03:52 ssh -> ..data/ssh chmod 600 known_hosts chmod: changing permissions of 'known_hosts': Read-only file system chmod 600 ssh chmod: changing permissions of 'ssh': Read-only file system

I added

  • name: GIT_SYNC_SSH_KNOWN_HOSTS value: "false"

But issue still exist I tried exec into container git-sync-sparks-job in pod spark-master-0

kubectl -n ebis exec -it -c git-sync-sparks-job spark-master-0 -- sh

cd /data

ls

git clone -v --no-checkout -b main @.***:ebis1/sparks-job.git

Cloning into 'sparks-job'... The authenticity of host 'gitlab.mydomain.com (192.168.0.11)' can't be established. ECDSA key fingerprint is SHA256:AZiBro/EmLwwohRn1ywqxl6RpyTvYdfOGpihbDSTVUc. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'gitlab.mydomain.com,192.168.0.11' (ECDSA) to the list of known @.***: Permission denied (publickey). fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/git-sync/issues/819#issuecomment-1730776654, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKWAVBVKFYV6XZ5J5WEIYTX3UF6BANCNFSM6AAAAAA5BDK7CQ . You are receiving this because you were mentioned.Message ID: @.***>

jackchuong commented 9 months ago

Sorry I don't understand your question These files are created from mounting secret/spark-ssh-git-secret into container .

git clone -v6 --no-checkout -b main git@gitlab.mydomain.com:ebis1/sparks-job.git /data
Cloning into '/data'...
ssh: Could not resolve hostname gitlab.mydomain.com: No address associated with hostname
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

I think the reason here : somehow container git-sync in pod spark-master-0 cannot resolve domain gitlab.mydomain.com ? But ontainer git-sync in pod nginx-deployment can, so it works fine

thockin commented 9 months ago

Earlier it said the key file was 0440, which SSH does not like if the current UID is the same as the key file's owner. I bet the file is owned by root and you are running as root. Don't run as root if you can avoid it.

Failing DNS is yet another new failure mode, not at all what you showed before, so I am not sure how I can be of help.

Obviously, make DNS work, first. Then make sure permissions and UID are correct.

thockin commented 9 months ago

This does not appear to be a bug, per se, so I'm going to close this for house-keeping's sake. Please let me know if you still can't get it working.