rancher / dashboard

The Rancher UI
https://rancher.com
Apache License 2.0
454 stars 257 forks source link

GitRepo in stalled condition Git Updating when using SSH #7746

Open izaac opened 2 years ago

izaac commented 2 years ago

Rancher Server Setup

Information about the Cluster

User Information

Describe the bug

When adding a GitRepo the GitRepo goes into Active state and I can visually see it gets the correct commit ID from the repository but after some time it goes into the Git Updating state and it stays there indefinitely

To Reproduce

Result

Expected Result

Screenshots

Screen Shot 2022-07-08 at 5 57 03 PM Screen Shot 2022-07-08 at 5 57 22 PM

Additional context

Rancher version v2.6.6 has fleet v0.3.9 Rancher version v2.6.2 has fleet v0.3.7

GitRepo Conditions

status:
  commit: 763fe7f7defce8f7878469efe5a2b213e4432cbd
  conditions:
  - lastUpdateTime: "2022-07-11T15:40:23Z"
    status: "True"
    type: Ready
  - lastUpdateTime: "2022-07-11T16:45:33Z"
    status: "True"
    type: Accepted
  - lastUpdateTime: "2022-07-11T15:40:24Z"
    status: "True"
    type: ImageSynced
  - lastUpdateTime: "2022-07-11T15:40:24Z"
    status: "False"
    type: Reconciling
  - lastUpdateTime: "2022-07-11T15:40:47Z"
    reason: Stalled
    status: "True"
    type: Stalled
  - lastUpdateTime: "2022-07-11T16:45:33Z"
    status: "True"
    type: Synced
manno commented 2 years ago

Can you provide the output of kubectl get gitrepo -A -o jsonpath='{.items[*].status}'?

izaac commented 2 years ago

@manno sure here it is:

{
    "commit": "763fe7f7defce8f7878469efe5a2b213e4432cbd",
    "conditions": [{
        "lastUpdateTime": "2022-07-12T16:10:55Z",
        "status": "True",
        "type": "Ready"
    }, {
        "lastUpdateTime": "2022-07-12T16:12:12Z",
        "status": "True",
        "type": "Accepted"
    }, {
        "lastUpdateTime": "2022-07-12T16:10:55Z",
        "status": "True",
        "type": "ImageSynced"
    }, {
        "lastUpdateTime": "2022-07-12T16:10:55Z",
        "status": "False",
        "type": "Reconciling"
    }, {
        "lastUpdateTime": "2022-07-12T16:11:19Z",
        "reason": "Stalled",
        "status": "True",
        "type": "Stalled"
    }, {
        "lastUpdateTime": "2022-07-12T16:12:12Z",
        "status": "True",
        "type": "Synced"
    }],
    "desiredReadyClusters": 0,
    "display": {
        "readyBundleDeployments": "0/0",
        "state": "GitUpdating"
    },
    "gitJobStatus": "Failed",
    "lastSyncedImageScanTime": null,
    "observedGeneration": 1,
    "readyClusters": 0,
    "resourceCounts": {
        "desiredReady": 0,
        "missing": 0,
        "modified": 0,
        "notReady": 0,
        "orphaned": 0,
        "ready": 0,
        "unknown": 0,
        "waitApplied": 0
    },
    "summary": {
        "desiredReady": 0,
        "ready": 0
    }
}
manno commented 2 years ago

Interesting, it says "gitJobStatus": "Failed". We could try to look at the gitjob resource status (kubectl get gitjob -A -o jsonpath='{.items[*].status}'), but I guess the error message really is in the gitjob's job log.

Sorry, I can't give you a complete command to run, that job's pod is hard to match. There should be a 'completed' pod with a name like kustomize-f5b73-ldfzv. Is there something about a failed git clone in that pod's log? kubectl logs -n fleet-default kustomize-f5b73-ldfzv --all-containers.

izaac commented 2 years ago

hi @manno it says a public key error, but I have the correct key in my github account and I provide the GitRepo the correct public/private keypar.

I also read invalid format ? Could it be something related on how the key secret is stored by the UI ?

{ "level": "error", "ts": 1657724500.1206753, "caller": "git/git.go:47", "msg": "Error running git [fetch --recurse-submodules=yes --depth=1 origin --update-head-ok --force 763fe7f7defce8f7878469efe5a2b213e4432cbd]: exit status 128\nWarning: Permanently added 'github.com' (ED25519) to the list of known hosts.\r\nLoad key \"/tekton/creds/.ssh/id_gitrepo-auth-5vpp7\": invalid format\r\ngit@github.com: Permission denied (publickey).\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n", "stacktrace": "github.com/tektoncd/pipeline/pkg/git.run\n\t/go/src/github.com/tektoncd/pipeline/pkg/git/git.go:47\ngithub.com/tektoncd/pipeline/pkg/git.Fetch\n\t/go/src/github.com/tektoncd/pipeline/pkg/git/git.go:137\nmain.main\n\t/go/src/github.com/tektoncd/pipeline/cmd/git-init/main.go:52\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225" } { "level": "fatal", "ts": 1657724500.1208425, "caller": "git-init/main.go:53", "msg": "Error fetching git repository: failed to fetch [763fe7f7defce8f7878469efe5a2b213e4432cbd]: exit status 128", "stacktrace": "main.main\n\t/go/src/github.com/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225" }

manno commented 2 years ago

Hm, the key needs to be in PEM format (not e.g. DER). In my test I used a gitrepo like this:

kind: GitRepo
apiVersion: fleet.cattle.io/v1alpha1
metadata:
  name: testing
spec:
  repo: git@github.com:fleetrepoci/testeks.git
  clientSecretName: git-auth

And created the key with ssh-keygen -f id_rsa_test -N "" and the secret with kubectl create secret generic git-auth --type kubernetes.io/ssh-auth --from-file=ssh-privatekey=id_rsa_test --from-file=ssh-publickey=id_rsa_test.pub. Though I did have some problems with RSA and switched to a an ecdsa key.

izaac commented 2 years ago

@manno The End-to-End process from the UI which I followed is this:

These steps used to work but now I am getting that error in the gitjob

manno commented 2 years ago

We are using a very old version of tekton here: https://github.com/rancher/build-tekton/blob/master/Dockerfile.dapper#L5-L9

We should update that, but didn't yet.

I'd try to verify the correct value ends up in the secret. Are you sure ed25519 keys worked before?

izaac commented 2 years ago

@manno I just created a gitrepo using just kubectl as you suggested that worked and I im using the ed25519. No errors in the gitjob.

So it's either how the secret is stored or how tekton or the job that's executing the git command malforming the keys somehow.

manno commented 2 years ago

So, this only happens when using the Rancher UI to create the keys?

izaac commented 2 years ago

@manno that seemed the case

manno commented 1 year ago

This seems to be an UI issue. I also found this in the fleet docs, seems relevant:

If you are using openssh format for the private key and you are creating it in the UI, make sure a carriage return is appended in the end of the private key.

kkaempf commented 1 year ago

This seems to be an UI issue.

Moving to team/ui

rak-phillip commented 1 year ago

@gaktive transferring to rancher/dashboard for triage.

richard-cox commented 1 year ago

@isaax Can you expand on what this is blocking? We're trying to tidy up/deprecate the confusing status/blocker label

izaac commented 1 year ago

@richard-cox After the last analysis by Mario it seems this is only occurring when using the UI. Using/Setting ssh repositories with kubectl works. I don't think we need the blocker label anymore.

mmartin24 commented 5 months ago

While working on ui automation for ssh private git repos, we noticed that tests were passing in Rancher 2.8-head using Fleet 0.9.3-rc2 while failing in Rancher 2.7-head using Fleet 0.8.3.

As it turned out we noticed this issue keeps occurring on Rancher 2.7 (latest check done in v2.7-bd0bb772b60a447d352789b3d7c4368ffe50eecf-head ): image

This works well when a carriage return is added at the end of the line as @manno pointed out on the private key (on public key is optional).

On 2.8-head (latest check on v2.8-10907fed8e43c1ff3124271f74eab63e47723619-head) it is working well without the extra line (and with the extra line as well). I guess this is because I don't see Tekton being used:

image (1)

I discussed this with Mario offline and the documentation points out on the info panel here to use the extra line on private keys. Since this is no longer needed for 2.8, I guess we could :

BTW: Automated tests for 4 private repo providers (Github. Gitlab, Azure, Bitbucket) using ssh are currently in place here. Currently adds an extra line line on both keys to ensure it works on all versions. This is subject to be adapted depending on what is decided for this issue.