Open neolit123 opened 5 years ago
WIP google doc for ideas: https://docs.google.com/document/d/1yaT7K85qMvZD7Q-ejWHBko1fgGaeGtjGLEZZ_Bz63VA/edit?usp=sharing
kubernetes/enhancements tracking issue: https://github.com/kubernetes/enhancements/issues/995
KEP was added here: https://github.com/kubernetes/enhancements/pull/994
update the OP with:
as list of cleanup changes that we can do regardless:
I don't see any preflight checks that are failing or not appropriate for Windows:
PS C:\> .\winsw\join.ps1
I0508 14:27:14.679350 1320 join.go:364] [preflight] found NodeName empty; using OS hostname as NodeName
I0508 14:27:14.681294 1320 initconfiguration.go:105] detected and using CRI socket: tcp://localhost:2375
[preflight] Running pre-flight checks
I0508 14:27:14.690354 1320 preflight.go:90] [preflight] Running general checks
I0508 14:27:14.952085 1320 checks.go:254] validating the existence and emptiness of directory \etc\kubernetes\manifests
I0508 14:27:14.953111 1320 checks.go:292] validating the existence of file \etc\kubernetes\kubelet.conf
I0508 14:27:14.961114 1320 checks.go:292] validating the existence of file \etc\kubernetes\bootstrap-kubelet.conf
I0508 14:27:14.963083 1320 checks.go:105] validating the container runtime
I0508 14:27:15.124971 1320 checks.go:131] validating if the service is enabled and active
I0508 14:27:16.139511 1320 checks.go:524] running all checks
I0508 14:27:16.655173 1320 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I0508 14:27:16.671956 1320 checks.go:622] validating kubelet version
I0508 14:27:16.834297 1320 checks.go:131] validating if the service is enabled and active
I0508 14:27:17.475948 1320 checks.go:209] validating availability of port 10250
I0508 14:27:17.476979 1320 checks.go:292] validating the existence of file C:/etc/kubernetes/pki/ca.crt
I0508 14:27:17.485021 1320 checks.go:439] validating if the connectivity type is via proxy or direct
I0508 14:27:17.487030 1320 join.go:426] [preflight] Discovering cluster-info
I0508 14:27:17.488914 1320 token.go:199] [discovery] Trying to connect to API Server "192.168.79.131:6443"
I0508 14:27:17.491183 1320 token.go:74] [discovery] Created cluster-info discovery client, requesting info from "https://192.168.79.131:6443"
I0508 14:27:17.512331 1320 token.go:140] [discovery] Requesting info from "https://192.168.79.131:6443" again to validate TLS against the pinned public key
I0508 14:27:17.529788 1320 token.go:163] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.79.131:6443"
I0508 14:27:17.532935 1320 token.go:205] [discovery] Successfully established connection with API Server "192.168.79.131:6443"
I0508 14:27:17.534895 1320 join.go:440] [preflight] Fetching init configuration
I0508 14:27:17.535888 1320 join.go:473] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0508 14:27:17.571275 1320 interface.go:278] Looking for system interface with a global IPv4 address
I0508 14:27:17.572239 1320 interface.go:196] Interface Ethernet0 is up
I0508 14:27:17.585881 1320 interface.go:302] Skipping: no address family match for "fe80::a977:1755:66ff:8b87" on interface "Ethernet0".
I0508 14:27:17.586472 1320 interface.go:310] Found global unicast address "192.168.79.128" on interface "Ethernet0".
I0508 14:27:17.587191 1320 preflight.go:101] [preflight] Running configuration dependant checks
I0508 14:27:17.594267 1320 controlplaneprepare.go:207] [download-certs] Skipping certs download
I0508 14:27:17.595830 1320 kubelet.go:105] [kubelet-start] writing bootstrap kubelet config file at \etc\kubernetes\bootstrap-kubelet.conf
I0508 14:27:17.604244 1320 kubelet.go:113] [kubelet-start] writing CA certificate at C:/etc/kubernetes/pki/ca.crt
I0508 14:27:17.766276 1320 kubelet.go:131] [kubelet-start] Stopping the kubelet
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "\\var\\lib\\kubelet\\config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "\\var\\lib\\kubelet\\kubeadm-flags.env"
I0508 14:27:18.466047 1320 kubelet.go:148] [kubelet-start] Starting the kubelet
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connectex: No connection could be made because the target machine actively refused it..
I0508 14:28:43.240082 1320 kubelet.go:166] [kubelet-start] preserving the crisocket information for the node
I0508 14:28:43.241154 1320 patchnode.go:30] [patchnode] Uploading the CRI Socket information "tcp://localhost:2375" to the Node API object "win-vb8d2n40slh" as an annotation
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
@benmoss
did you start the kubelet service using the Start-Service
frontend instead of sc
?
also, did you had to apply the \
-> c:\
fix i did in \etc\kubernetes\kubelet.conf
@benmoss Can you share .\winsw\join.ps1?
I am using WinSW to wrap kubelet.exe as a Service. I really like WinSW as a service wrapper, it would be my vote rather than using the --windows-service
flag.
https://github.com/benmoss/kubeadm-windows/blob/master/join.ps1 https://github.com/benmoss/kubeadm-windows/blob/master/kubelet.xml
To install the service you just need to run kubelet.exe install
from that directory. The way WinSW works is you download the WinSW binary, rename it to the name of the service, and put it in the same directory as the corresponding xml config file. kubelet.exe install
then registers it as a Windows service.
i think it might be a case where sc
does something differently.
i will try the different options.
And no, I didn't have to fix the paths in /etc/kubernetes/kubelet.conf
. The only path problem I'm running into is that kubelet is joining paths to /etc/kubernetes/pki/ca.crt
incorrectly. It errors with
F0508 14:27:19.857413 4916 server.go:251] unable to load client CA file C:\var\lib\kubelet\etc\kubernetes\pki\ca.crt: open C:\var\lib\kubelet\etc\kubernetes\pki\ca.crt: The system cannot find the path specified.
I have been working around that by just copying /etc
into /var/lib/kubelet
/ but that's obviously not right.
updated OP with latest PRs merged. for 1.15 (alpha) remaining items are install script and docs.
EDIT: looks like the docs and script will miss the 1.15 release deadlines.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
SIG-Windows Traige meeting
We will be tracking it here https://github.com/kubernetes/enhancements/issues/995 Closing this for now
/close
@immuzz: Closing this issue.
@immuzz this issue is more granular and tracks separate development items compared to the main https://github.com/kubernetes/enhancements/issues/995
it should remain open.
@marosset @michmike
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/lifecycle frozen
Windows k8s now supports host processes or priv containers. This will simplify the cni / proxy deployment and we can graduate the kubeadm support to ga.
/cc TODO: After windows ut can run regularly, we need a grid board to know the code coverage of windows ut like https://testgrid.k8s.io/sig-testing-canaries#ci-kubernetes-coverage-unit.
recently had to do some fixes in the system validators library related to parsing the OS name on Windows after @jsturtevant reported the issue:
this is another thing we fixed recently, which took a while:
also checked what we have in the KEP for GA graduation:
The feature is well tested and adapted by the community. e2e tests are stable and consistent with other SIG-Windows CI signals. Documentation is complete.
i think we are pretty much done with this thanks to CAPZ signal, but there is this missing AI that has not been addressed for ~2 years. it's the result of a refactor that happened at some point in the page for adding Windows nodes.
cc @jsturtevant @marosset @aravindhp @knabben
(see my latest comment/proposal there https://github.com/kubernetes/website/issues/34476#issuecomment-2340637658) can we just xref the sig-windows-tools guides from the windows guide and close/repurpose that website ticket?
i think after that we could just say that kubeadm support is GA. the kube-proxy / CNI story is still not so simple for Windows users, but that seems out of band. same for other documentation such as https://github.com/kubernetes/website/issues/31428
i think after that we could just say that kubeadm support is GA.
if we agree on that i can close:
and PR the KEP with a GA status.
GA
For GA features, we (very much) like to have docs.
joined the sig windows meeting today, and we discussed the docs part. sig windows agreed with my proposal here:
on the technical side there seem to be a couple of GA blockers around kube-proxy:
kubernetes/enhancements tracking issue:
KEP was added here:
GA graduation:
[x] kubeadm is actively being tested by the CAPZ provider: https://github.com/kubernetes-sigs/cluster-api-provider-azure
[x] add a dedicated task page for "adding windows nodes": https://github.com/kubernetes/website/issues/34476
[ ] the kube-proxy image for Windows is still not auto-build on k8s releases
[ ] kube-proxy / CNI on Windows need a contract, which might require some wider ecosystem changes:
beta graduation:
~- [ ] upgrades~ upgrades were delegated to documentation and having scripts for the process is not really needed.
[x] add remaining scripts to sig-windows-tools assigned: @benmoss PR: https://github.com/kubernetes-sigs/sig-windows-tools/pull/34
[x] set up e2e tests assigned: @benmoss @neolit123 PRs: https://github.com/kubernetes/test-infra/pull/16718 https://github.com/kubernetes-sigs/sig-windows-tools/pull/39 https://k8s-testgrid.appspot.com/sig-windows#kubeadm-windows-gcp-k8s-stable status: debugging e2e failures / flakes.
[x] finalize the documentation assigned: @benmoss PR: https://github.com/kubernetes/website/pull/19217 status: merged
alpha graduation:
as list of cleanup changes that we can do regardless:
[x] fix Windows related paths and defaults assigned: @ksubrmnn
PR: https://github.com/kubernetes/kubernetes/pull/77710 PR: https://github.com/kubernetes/kubernetes/pull/78053
[x] kube-proxy retry mechanic assigned: @ksubrmnn PR: https://github.com/kubernetes/kubernetes/pull/78612
[x] flanneld should support a flag for its config assigned: @neolit123 PR: https://github.com/coreos/flannel/pull/1136
[x] docs assigned @ksubrmnn PR: https://github.com/kubernetes/website/pull/14644
[x] install script assigned @ksubrmnn PR: https://github.com/kubernetes-sigs/sig-windows-tools/pull/1 PR: TODO
side work:
[x] fix wrongly defaulted kubelet flags on windows: PR: TODO https://github.com/kubernetes/kubeadm/issues/2967
[ ] add preflight checks (if needed) assigned: @benmoss PR: TODO possibly only support 1803+? also see https://github.com/kubernetes/kubernetes/blob/0f93328c7a051e28a097270daaf7a7ff6f90bae0/cmd/kubeadm/app/util/system/types_windows.go
[x] don't depend on powershell calls both kubeadm and
pkg/util/initsystem
depend on powershell. these should be system calls instead. assigned: @ksubrmnn PR: https://github.com/kubernetes/kubernetes/pull/77989 PR: https://github.com/kubernetes/kubernetes/pull/78189 PR: TODO system checks still have this https://github.com/kubernetes/kubernetes/blob/0f93328c7a051e28a097270daaf7a7ff6f90bae0/cmd/kubeadm/app/util/system/types_windows.go[x] fix the symbolic links that are currently required in https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/kubeadm/scripts/PrepareNode.ps1#L65 see https://github.com/kubernetes/kubeadm/issues/2330 https://github.com/kubernetes/kubeadm/issues/2419
/kind feature /area ecosystem /priority important-longterm /assign
cc @michmike @PatrickLang