kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.76k stars 716 forks source link

tracking issue for Windows support #1393

Open neolit123 opened 5 years ago

neolit123 commented 5 years ago

kubernetes/enhancements tracking issue:

KEP was added here:


GA graduation:


beta graduation:

~- [ ] upgrades~ upgrades were delegated to documentation and having scripts for the process is not really needed.


alpha graduation:

as list of cleanup changes that we can do regardless:


side work:

/kind feature /area ecosystem /priority important-longterm /assign

cc @michmike @PatrickLang

neolit123 commented 5 years ago

WIP google doc for ideas: https://docs.google.com/document/d/1yaT7K85qMvZD7Q-ejWHBko1fgGaeGtjGLEZZ_Bz63VA/edit?usp=sharing

neolit123 commented 5 years ago

kubernetes/enhancements tracking issue: https://github.com/kubernetes/enhancements/issues/995

KEP was added here: https://github.com/kubernetes/enhancements/pull/994

neolit123 commented 5 years ago

update the OP with:

as list of cleanup changes that we can do regardless:

benmoss commented 5 years ago

I don't see any preflight checks that are failing or not appropriate for Windows:

PS C:\> .\winsw\join.ps1
I0508 14:27:14.679350    1320 join.go:364] [preflight] found NodeName empty; using OS hostname as NodeName
I0508 14:27:14.681294    1320 initconfiguration.go:105] detected and using CRI socket: tcp://localhost:2375
[preflight] Running pre-flight checks
I0508 14:27:14.690354    1320 preflight.go:90] [preflight] Running general checks
I0508 14:27:14.952085    1320 checks.go:254] validating the existence and emptiness of directory \etc\kubernetes\manifests
I0508 14:27:14.953111    1320 checks.go:292] validating the existence of file \etc\kubernetes\kubelet.conf
I0508 14:27:14.961114    1320 checks.go:292] validating the existence of file \etc\kubernetes\bootstrap-kubelet.conf
I0508 14:27:14.963083    1320 checks.go:105] validating the container runtime
I0508 14:27:15.124971    1320 checks.go:131] validating if the service is enabled and active
I0508 14:27:16.139511    1320 checks.go:524] running all checks
I0508 14:27:16.655173    1320 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I0508 14:27:16.671956    1320 checks.go:622] validating kubelet version
I0508 14:27:16.834297    1320 checks.go:131] validating if the service is enabled and active
I0508 14:27:17.475948    1320 checks.go:209] validating availability of port 10250
I0508 14:27:17.476979    1320 checks.go:292] validating the existence of file C:/etc/kubernetes/pki/ca.crt
I0508 14:27:17.485021    1320 checks.go:439] validating if the connectivity type is via proxy or direct
I0508 14:27:17.487030    1320 join.go:426] [preflight] Discovering cluster-info
I0508 14:27:17.488914    1320 token.go:199] [discovery] Trying to connect to API Server "192.168.79.131:6443"
I0508 14:27:17.491183    1320 token.go:74] [discovery] Created cluster-info discovery client, requesting info from "https://192.168.79.131:6443"
I0508 14:27:17.512331    1320 token.go:140] [discovery] Requesting info from "https://192.168.79.131:6443" again to validate TLS against the pinned public key
I0508 14:27:17.529788    1320 token.go:163] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.79.131:6443"
I0508 14:27:17.532935    1320 token.go:205] [discovery] Successfully established connection with API Server "192.168.79.131:6443"
I0508 14:27:17.534895    1320 join.go:440] [preflight] Fetching init configuration
I0508 14:27:17.535888    1320 join.go:473] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0508 14:27:17.571275    1320 interface.go:278] Looking for system interface with a global IPv4 address
I0508 14:27:17.572239    1320 interface.go:196] Interface Ethernet0 is up
I0508 14:27:17.585881    1320 interface.go:302] Skipping: no address family match for "fe80::a977:1755:66ff:8b87" on interface "Ethernet0".
I0508 14:27:17.586472    1320 interface.go:310] Found global unicast address "192.168.79.128" on interface "Ethernet0".
I0508 14:27:17.587191    1320 preflight.go:101] [preflight] Running configuration dependant checks
I0508 14:27:17.594267    1320 controlplaneprepare.go:207] [download-certs] Skipping certs download
I0508 14:27:17.595830    1320 kubelet.go:105] [kubelet-start] writing bootstrap kubelet config file at \etc\kubernetes\bootstrap-kubelet.conf
I0508 14:27:17.604244    1320 kubelet.go:113] [kubelet-start] writing CA certificate at C:/etc/kubernetes/pki/ca.crt
I0508 14:27:17.766276    1320 kubelet.go:131] [kubelet-start] Stopping the kubelet
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "\\var\\lib\\kubelet\\config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "\\var\\lib\\kubelet\\kubeadm-flags.env"
I0508 14:27:18.466047    1320 kubelet.go:148] [kubelet-start] Starting the kubelet
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connectex: No connection could be made because the target machine actively refused it..
I0508 14:28:43.240082    1320 kubelet.go:166] [kubelet-start] preserving the crisocket information for the node
I0508 14:28:43.241154    1320 patchnode.go:30] [patchnode] Uploading the CRI Socket information "tcp://localhost:2375" to the Node API object "win-vb8d2n40slh" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
neolit123 commented 5 years ago

@benmoss did you start the kubelet service using the Start-Servicefrontend instead of sc?

neolit123 commented 5 years ago

also, did you had to apply the \ -> c:\ fix i did in \etc\kubernetes\kubelet.conf

ksubrmnn commented 5 years ago

@benmoss Can you share .\winsw\join.ps1?

benmoss commented 5 years ago

I am using WinSW to wrap kubelet.exe as a Service. I really like WinSW as a service wrapper, it would be my vote rather than using the --windows-service flag.

https://github.com/benmoss/kubeadm-windows/blob/master/join.ps1 https://github.com/benmoss/kubeadm-windows/blob/master/kubelet.xml

To install the service you just need to run kubelet.exe install from that directory. The way WinSW works is you download the WinSW binary, rename it to the name of the service, and put it in the same directory as the corresponding xml config file. kubelet.exe install then registers it as a Windows service.

neolit123 commented 5 years ago

i think it might be a case where sc does something differently. i will try the different options.

benmoss commented 5 years ago

And no, I didn't have to fix the paths in /etc/kubernetes/kubelet.conf. The only path problem I'm running into is that kubelet is joining paths to /etc/kubernetes/pki/ca.crt incorrectly. It errors with

F0508 14:27:19.857413    4916 server.go:251] unable to load client CA file C:\var\lib\kubelet\etc\kubernetes\pki\ca.crt: open C:\var\lib\kubelet\etc\kubernetes\pki\ca.crt: The system cannot find the path specified.

I have been working around that by just copying /etc into /var/lib/kubelet/ but that's obviously not right.

neolit123 commented 5 years ago

updated OP with latest PRs merged. for 1.15 (alpha) remaining items are install script and docs.

EDIT: looks like the docs and script will miss the 1.15 release deadlines.

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

neolit123 commented 4 years ago

/remove-lifecycle stale

immuzz commented 4 years ago

SIG-Windows Traige meeting

We will be tracking it here https://github.com/kubernetes/enhancements/issues/995 Closing this for now

immuzz commented 4 years ago

/close

k8s-ci-robot commented 4 years ago

@immuzz: Closing this issue.

In response to [this](https://github.com/kubernetes/kubeadm/issues/1393#issuecomment-682068583): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
neolit123 commented 4 years ago

@immuzz this issue is more granular and tracks separate development items compared to the main https://github.com/kubernetes/enhancements/issues/995

it should remain open.

immuzz commented 4 years ago

@marosset @michmike

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

neolit123 commented 3 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

k8s-triage-robot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

neolit123 commented 3 years ago

/remove-lifecycle rotten

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

neolit123 commented 3 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

neolit123 commented 2 years ago

/lifecycle frozen

Windows k8s now supports host processes or priv containers. This will simplify the cni / proxy deployment and we can graduate the kubeadm support to ga.

pacoxu commented 2 years ago

/cc TODO: After windows ut can run regularly, we need a grid board to know the code coverage of windows ut like https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit.

neolit123 commented 2 months ago

recently had to do some fixes in the system validators library related to parsing the OS name on Windows after @jsturtevant reported the issue:

this is another thing we fixed recently, which took a while:

also checked what we have in the KEP for GA graduation:

The feature is well tested and adapted by the community. e2e tests are stable and consistent with other SIG-Windows CI signals. Documentation is complete.

i think we are pretty much done with this thanks to CAPZ signal, but there is this missing AI that has not been addressed for ~2 years. it's the result of a refactor that happened at some point in the page for adding Windows nodes.

cc @jsturtevant @marosset @aravindhp @knabben

(see my latest comment/proposal there https://github.com/kubernetes/website/issues/34476#issuecomment-2340637658) can we just xref the sig-windows-tools guides from the windows guide and close/repurpose that website ticket?

i think after that we could just say that kubeadm support is GA. the kube-proxy / CNI story is still not so simple for Windows users, but that seems out of band. same for other documentation such as https://github.com/kubernetes/website/issues/31428

neolit123 commented 2 months ago

i think after that we could just say that kubeadm support is GA.

if we agree on that i can close:

and PR the KEP with a GA status.

sftim commented 2 months ago

GA

For GA features, we (very much) like to have docs.

neolit123 commented 2 months ago

joined the sig windows meeting today, and we discussed the docs part. sig windows agreed with my proposal here:

on the technical side there seem to be a couple of GA blockers around kube-proxy: