kubernetes-sigs / sig-windows-dev-tools

This is a batteries included local development environment for Kubernetes on Windows.
Apache License 2.0
80 stars 46 forks source link

Control plane machine not coming up #246

Closed adeniyistephen closed 8 months ago

adeniyistephen commented 1 year ago

After running make all the ubuntu control plane is not coming up.

calico: 3.25.0; containerd: 1.6.15
Bringing machine 'controlplane' up with 'virtualbox' provider...
==> controlplane: Checking if box 'ubuntu/bionic64' version '20221117.0.0' is up to date...
==> controlplane: Clearing any previously set forwarded ports...
==> controlplane: Clearing any previously set network interfaces...
==> controlplane: Preparing network interfaces based on configuration...
    controlplane: Adapter 1: nat
    controlplane: Adapter 2: hostonly
==> controlplane: Forwarding ports...
    controlplane: 22 (guest) => 2222 (host) (adapter 1)
==> controlplane: Running 'pre-boot' VM customizations...
==> controlplane: Booting VM...
==> controlplane: Waiting for machine to boot. This may take a few minutes...
    controlplane: SSH address: 127.0.0.1:2222
    controlplane: SSH username: vagrant
    controlplane: SSH auth method: private key
    controlplane: Warning: Connection reset. Retrying...
    controlplane: Warning: Connection reset. Retrying...
    controlplane: Warning: Connection reset. Retrying...
    controlplane: Warning: Connection reset. Retrying...
    controlplane: Warning: Connection reset. Retrying...

Steps taken: Installed vagrant Installed VirtualBox version 7.0 Clone repo Run make all

System Configuration: OS: VirtualBox (Fedora 36) on a Windows-based machine. RAM: 12GB (can be extended to 16GB, based on the workload)

Expected: Ubuntu control plane machine to come up and Windows machine to come up for a Kubernetes cluster.

mloskot commented 1 year ago

I'd re-try fresh:

I've been testing this setup extensively over last few days and I found that all VM-s get up and provisioned successfully in 1 in 2-3 attempts. Most often I had/have to do vagrant reload --provision winw1 (rarely make 2-vagrant-up).

adeniyistephen commented 1 year ago

Thanks @mloskot I would have to try that! I changed the ubuntu machine to another type, thinking the machine had issues, but still didn't come up. Just saying.

mloskot commented 1 year ago

I have recently run make all quite a lot of times and I haven't noticed such problem with the controlplane node, it usually boots up fairly quick:

Bringing machine 'controlplane' up with 'virtualbox' provider...
==> controlplane: Importing base box 'roboxes/ubuntu2004'...
==> controlplane: Matching MAC address for NAT networking...
==> controlplane: Checking if box 'roboxes/ubuntu2004' version '4.2.14' is up to date...
==> controlplane: Setting the name of the VM: sig-windows-dev-tools_controlplane_1681726008743_81052
==> controlplane: Clearing any previously set network interfaces...
==> controlplane: Preparing network interfaces based on configuration...
    controlplane: Adapter 1: nat
    controlplane: Adapter 2: hostonly
==> controlplane: Forwarding ports...
    controlplane: 22 (guest) => 2222 (host) (adapter 1)
==> controlplane: Running 'pre-boot' VM customizations...
==> controlplane: Booting VM...
==> controlplane: Waiting for machine to boot. This may take a few minutes...
    controlplane: SSH address: 127.0.0.1:2222
    controlplane: SSH username: vagrant
    controlplane: SSH auth method: private key
    controlplane: Warning: Connection reset. Retrying...
    controlplane: Warning: Connection aborted. Retrying...
    controlplane: Warning: Connection reset. Retrying...
    controlplane: Warning: Connection aborted. Retrying...
    controlplane: Warning: Connection reset. Retrying...
    controlplane:
    controlplane: Vagrant insecure key detected. Vagrant will automatically replace
    controlplane: this with a newly generated keypair for better security.
    controlplane:
    controlplane: Inserting generated public key within guest...
    controlplane: Removing insecure key from the guest if it's present...
    controlplane: Key inserted! Disconnecting and reconnecting using new SSH key...
==> controlplane: Machine booted and ready!
...

But, I run it directly on Windows 11 (WSL) as you can see in my detailed notes here

adeniyistephen commented 1 year ago

yes, I can see it, mine is just a completely different environment - I'm running make all on a fedora vm with a Windows 11 main laptop, that's my issue. I haven't retried the steps you talked about earlier tho: https://github.com/kubernetes-sigs/sig-windows-dev-tools/issues/246#issuecomment-1510485774 I'll do that now and let you know if the issue still remains the same, I might end up switching to WSL, I didn't try it earlier because I think someone said it wasn't working earlier - but if you did something to your WSL to make it work, it would be nice to update the readme.

mloskot commented 1 year ago

I'm not a virtualization expert, but is it possible there is nested virtualization issue in your environment?

adeniyistephen commented 1 year ago

yeah, I enabled nested vt-x/amd-v, it still didn't work. Although I was able to bring up the machine alone with just vagrant up controlplane but not with other components if I run make all

adeniyistephen commented 1 year ago

@mloskot There are just such a whole lot of virtualization issues with fedora, after enabling nested vt-x - here's what I got next after starting from GUI:

sig-windows-dev-tools_controlplane_1681144330551_15000

Cannot enable nested VT-x/AMD-V without nested-paging and unrestricted guest execution!
(VERR_CPUM_INVALID_HWVIRT_CONFIG).

seen something like this before?

mloskot commented 1 year ago

No, I haven't. It looks like trying simple Vagtantfile with basic generic Ubuntu and windows images would be a good test, before continuing with SWDT

claudiubelu commented 1 year ago

Question: so, you have a Windows machine, on which you have a Fedora 36 in a virtualbox, in which you're trying to spawn a new virtualbox vm, in which the control plane will be created? I'm not sure that would work, I don't know if virtualbox supports nested virtualization in the first place, but if it did, the performance would be pretty low.

You could also try Hyper-V VMs (you should be able to enable it, especially if you have Windows Pro or Enterprise), and you can enable nested virtualization on those VMs.

adeniyistephen commented 1 year ago

Correct, that is what I wanted to do initially, but it didn't work - but now I'm running the WSL approach on my windows with @mloskot

mloskot commented 1 year ago

now I'm running the WSL approach on my windows with @mloskot

@adeniyistephen I'd suggest to switch over to the new non-WSL approach on Windows, see #264

adeniyistephen commented 1 year ago

Yes, I have switched to your branch and I'll be running both the magefiles and WSL approaches.

mloskot commented 1 year ago

@adeniyistephen FYA, I've just added tester's quickstart to #264

I personally wouldn't bother with WSL. The WSL is used as nothing else than just make distribution for SWDT. I guess, you could equally use CygWin or GNU Make from GnuWin32.

adeniyistephen commented 1 year ago

Sure, Awesome.

k8s-triage-robot commented 10 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mloskot commented 10 months ago

I think this can be closed as replaced by the currently ongoing SWDT CLI sub-project

/cc @knabben

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 8 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/sig-windows-dev-tools/issues/246#issuecomment-2007675791): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.