vmware-tanzu / cluster-api-provider-bringyourownhost

Kubernetes Cluster API Provider BYOH for already-provisioned hosts running Linux.
Apache License 2.0
222 stars 71 forks source link

Move byohosts after clusterctl move to new management cluster #853

Open robinAwallace opened 8 months ago

robinAwallace commented 8 months ago

What steps did you take and what happened:

Hello,

I have a BYOH cluster that I would like to move to a new management cluster using the clusterctl move command clusterctl move --kubeconfig <byoh-management-cluster> --to-kubeconfig <new-management-cluster>.

All resource are moved but the byohost are not moved. Which is maybe not that strange. But Im not able to register the machines to the new management cluster.

I have tried to generate new bootstrap-kubeconfigs for the new management cluster and send them to the machines and restart the byoh-agent with no success.

What did you expect to happen:

After clusterctl move I would like to re-register the machines to the new management cluster.

Anything else you would like to add:

Environment:

dharmjit commented 8 months ago

Hi @robinAwallace, I think the cluster move is not validated for BYOH and it certainly requires agent restart to talk to the new management cluster.

All resource are moved but the byohost are not moved. Which is maybe not that strange

This might be due to the permissions on ByoHost CRDs. Did you get any errors with clusterctl move?

restart the byoh-agent with no success

Can you share the agent output/errors too?

robinAwallace commented 8 months ago

Hello :slightly_smiling_face:

No there where no errors from clusterctl move. The only error I got was that the byoh controller could not find the byohosts.

But I got it to work. After doing clusterctl move I had to move the byo host objects manually from the first cluster to the new management cluster. To do this I had to delete the byoh webhook stopping you do add byoh objects.

Then I had to create new kubeconfigs with the correct cert and ip to the new control-plane. Also I had to create a new csr to validate byoh agent user. Finally I sent the new kubeconfig to the nodes at /.byoh/config and restarted the agent. Then everything work fine :partying_face:

dharmjit commented 8 months ago

Awesome, there are still some UX gaps but it will be nice to have the above manual process captured in some doc. Would you like to create a PR for documentation of the steps that you have followed?

syndicut commented 3 months ago

But I got it to work. After doing clusterctl move I had to move the byo host objects manually from the first cluster to the new management cluster. To do this I had to delete the byoh webhook stopping you do add byoh objects.

Then I had to create new kubeconfigs with the correct cert and ip to the new control-plane. Also I had to create a new csr to validate byoh agent user. Finally I sent the new kubeconfig to the nodes at /.byoh/config and restarted the agent. Then everything work fine 🥳

Tried to follow the same process, but byoh-agent got stuck with:

I0315 15:46:24.611558   10704 host_reconciler.go:91]  "msg"="Machine ref not yet set"

I believe the reason for this is that Status for ByoHost is not copied to destination cluster, but because it has a AttachedByoMachineLabel byoh infrastructure controller is not setting it. Tried to delete AttachedByoMachineLabel label and restart byoh infrastructure controller, but it didn't help - byoh infrastructure controller now says that:

I0315 16:16:12.321312       1 byomachine_controller.go:270]  "msg"="Attempting host reservation" 
I0315 16:16:12.321493       1 byomachine_controller.go:519]  "msg"="No hosts found, waiting.."
robinAwallace commented 3 months ago

Hmm, I did not have this issue.

But yes as you say it does not copy over the ByoHosts when running the move command. So I had to copy them manually by doing a kubectl get byohosts.infrastructure.cluster.x-k8s.io -n <namesapce> <byohost> -oyaml and save it to a file. But before I can apply it to the new management cluster I have to temporarily remove the webhook, validatingwebhookconfigurations.admissionregistration.k8s.io byoh-validating-webhook-configuration.

I hope you get it to work :slightly_smiling_face:

syndicut commented 3 months ago

Hmm, I did not have this issue.

But yes as you say it does not copy over the ByoHosts when running the move command. So I had to copy them manually by doing a kubectl get byohosts.infrastructure.cluster.x-k8s.io -n <namesapce> <byohost> -oyaml and save it to a file. But before I can apply it to the new management cluster I have to temporarily remove the webhook, validatingwebhookconfigurations.admissionregistration.k8s.io byoh-validating-webhook-configuration.

I hope you get it to work 🙂

I think my problem was that I skipped that part:

Also I had to create a new csr to validate byoh agent user. Finally I sent the new kubeconfig to the nodes at /.byoh/config

But I got it got work, though I had to add a little patch (nebius/cluster-api-provider-bringyourownhost#9)

This way the move process is very simple:

  1. You just do clusterctl move
  2. Then repeat steps defined here https://github.com/vmware-tanzu/cluster-api-provider-bringyourownhost/blob/main/docs/getting_started.md#generating-the-bootstrap-kubeconfig-file (and copy kubeconfig to the host)
  3. Then just remove ~/.byoh/config and restart byoh-agent - it then recreates ByoHost in new cluster and everything just works

I think I'll write some e2e tests and bring PR with it (and some documentation about move process)