konstructio / kubefirst-api

Kubefirst API that serves console frontend
MIT License
10 stars 14 forks source link

Cannot deploy AWS or GCP due to kubectl command failing in the API #376

Closed mrsimonemms closed 2 months ago

mrsimonemms commented 2 months ago

This issue was introduced by #363

The problem

When deploying Kubefirst to either AWS or GCP (may be other clouds - k3d and Civo are ok), there is an error exiting with status 1 command that appears in the logs and we cannot proceed past that. After much code trawling, it appears that it is this command that is failing.

The problem is that the mapping of the /.k1 directory to the locally running kubefirst-api container is not the same across all cloud providers. This is likely due to authentication differences - eg, the EKS/GKE kubeconfigs require the AWS/GCP CLI whereas Civo is self-contained, with all the parameters in the kubeconfig.

Kubefirst version

Definitely appears in v2.4.13 of Kuberfirst. Was introduced in v0.1.25 of API

Steps to reproduce

Suggested fix

I think there's a few things that need to happen.

Fixing the immediate problem

The immediate problem is that our published version does not work for multiple cloud providers. I would suggest reverting #363 and then publishing a new version. This should at least solve the urgency of the problem

Fixing the timeout issue in ArgoCD deployment

This is the issue that was being looked at in the original PR. I don't know enough about the problem to be certain, but doing a rollout deployment, retrying and waiting feels a bit like we're addressing the symptoms of the problem rather than addressing the problem itself. I am available for pairing on this to help debug this problem.

If it is indeed the case that this is the only way the problem can be solved (which is fine) then it should use the existing client-go configuration rather than using kubectl. client-go is used extensively in Kubefirst-API so I expect that this would be using the correct configuration, regardless of cloud provider.

NB. client-go doesn't have a rollout restart command, so the likely scenario would be adding an annotation to the deployment (see SO). This is the accepted way of forcing a restart of the pod.