When deploying Kubefirst to either AWS or GCP (may be other clouds - k3d and Civo are ok), there is an error exiting with status 1 command that appears in the logs and we cannot proceed past that. After much code trawling, it appears that it is this command that is failing.
The problem is that the mapping of the /.k1 directory to the locally running kubefirst-api container is not the same across all cloud providers. This is likely due to authentication differences - eg, the EKS/GKE kubeconfigs require the AWS/GCP CLI whereas Civo is self-contained, with all the parameters in the kubeconfig.
Kubefirst version
Definitely appears in v2.4.13 of Kuberfirst. Was introduced in v0.1.25 of API
Steps to reproduce
Deploy from either command line or UI
Suggested fix
I think there's a few things that need to happen.
Fixing the immediate problem
The immediate problem is that our published version does not work for multiple cloud providers. I would suggest reverting #363 and then publishing a new version. This should at least solve the urgency of the problem
Fixing the timeout issue in ArgoCD deployment
This is the issue that was being looked at in the original PR. I don't know enough about the problem to be certain, but doing a rollout deployment, retrying and waiting feels a bit like we're addressing the symptoms of the problem rather than addressing the problem itself. I am available for pairing on this to help debug this problem.
If it is indeed the case that this is the only way the problem can be solved (which is fine) then it should use the existing client-go configuration rather than using kubectl. client-go is used extensively in Kubefirst-API so I expect that this would be using the correct configuration, regardless of cloud provider.
NB.client-go doesn't have a rollout restart command, so the likely scenario would be adding an annotation to the deployment (see SO). This is the accepted way of forcing a restart of the pod.
The problem
When deploying Kubefirst to either AWS or GCP (may be other clouds - k3d and Civo are ok), there is an error
exiting with status 1
command that appears in the logs and we cannot proceed past that. After much code trawling, it appears that it is this command that is failing.The problem is that the mapping of the
/.k1
directory to the locally runningkubefirst-api
container is not the same across all cloud providers. This is likely due to authentication differences - eg, the EKS/GKE kubeconfigs require the AWS/GCP CLI whereas Civo is self-contained, with all the parameters in the kubeconfig.Kubefirst version
Definitely appears in v2.4.13 of Kuberfirst. Was introduced in v0.1.25 of API
Steps to reproduce
Suggested fix
I think there's a few things that need to happen.
Fixing the immediate problem
The immediate problem is that our published version does not work for multiple cloud providers. I would suggest reverting #363 and then publishing a new version. This should at least solve the urgency of the problem
Fixing the timeout issue in ArgoCD deployment
This is the issue that was being looked at in the original PR. I don't know enough about the problem to be certain, but doing a rollout deployment, retrying and waiting feels a bit like we're addressing the symptoms of the problem rather than addressing the problem itself. I am available for pairing on this to help debug this problem.
If it is indeed the case that this is the only way the problem can be solved (which is fine) then it should use the existing client-go configuration rather than using
kubectl
.client-go
is used extensively in Kubefirst-API so I expect that this would be using the correct configuration, regardless of cloud provider.NB.
client-go
doesn't have arollout restart
command, so the likely scenario would be adding an annotation to the deployment (see SO). This is the accepted way of forcing a restart of the pod.