Closed kaolaaz163 closed 5 months ago
@kaolaaz163 I hit the same issue with my 4.14.2 single-node cluster and was able to get the operator working by using the release-4.14
branch from this repo and deploying using the development instructions. I was trying to follow the guide at https://www.opensourcerers.org/2023/08/21/quick-start-to-smallest-openshift-cluster-for-windows-workload/ when I hit the issue you describe.
Approximate summary of steps on Ubuntu 22 WSL2 with Docker:
git clone --recurse-submodules -b release-4.14 https://github.com/openshift/windows-machine-config-operator.git
export OPENSHIFT_CI=false
export KUBE_SSH_KEY_PATH=/somewhere/windowskey
export OPERATOR_IMAGE=quay.io/some_user/wmco:test-4.14-1
cd windows-machine-config-operator
sudo ln -s /usr/bin/docker /usr/local/bin/podman
make base-img
make wmco-img IMG=$OPERATOR_IMAGE
docker push quay.io/some_user/wmco:test-4.14-1
hack/olm.sh run -k $KUBE_SSH_KEY_PATH
The scripts use podman
but docker works fine also, and the install is very easy once you get the right source level. I used the current release-4.14 branch but the exact git commit was
commit 9456533c6d17c741e3f79c32613d4a1cdad6cf74 (HEAD -> release-4.14, origin/release-4.14)
The blog post linked above describes all the necessary steps, and I was very impressed by the ease of use for this operator, both in terms of running it and also building it. Thank you to the people who wrote up the developer docs!
@kaolaaz163 thanks for opening up the issue. Releasing a community 9.0.0/4.14 WMCO is in the pipeline.
Thanks @trevor-dolby-at-ibm-com ,Following your steps, WMCO can be installed successfully.I encountered the following error again when bootstrap the windows node, prompting that the windows-instance-config-daemon service could not be found.Can anyone help me take a look?
2023-12-05T07:27:56Z ERROR Reconciler error {"controller": "configmap", "controllerGroup": "", "controllerKind": "ConfigMap", "ConfigMap": {"name":"windows-instances","namespace":"openshift-windows-machine-config-operator"}, "namespace": "openshift-windows-machine-config-operator", "name": "windows-instances", "reconcileID": "4e0d7a24-64e3-4bfb-8aba-40463e363657", "error": "error configuring host with address 192.168.3.156: bootstrapping the Windows instance failed: unable to cleanup the Windows instance: error ensuring windows-instance-config-daemon Windows service is removed: error checking if windows-instance-config-daemon Windows service exists: error running sc.exe qc windows-instance-config-daemon: Process exited with status 1060"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226 2023-12-05T07:27:56Z DEBUG events error configuring host with address 192.168.3.156: bootstrapping the Windows instance failed: unable to cleanup the Windows instance: error ensuring windows-instance-config-daemon Windows service is removed: error checking if windows-instance-config-daemon Windows service exists: error running sc.exe qc windows-instance-config-daemon: Process exited with status 1060{"type": "Warning", "object": {"kind":"ConfigMap","namespace":"openshift-windows-machine-config-operator","name":"windows-instances","uid":"3443a130-f3ca-4835-8860-b310f97b3bce","apiVersion":"v1","resourceVersion":"13302253"}, "reason": "InstanceSetupFailure"}
There were the above errors when using windows server 2019. After changing to windows server 2022, it worked.
@kaolaaz163 I was going to suggest the Windows version might be the problem (I'm running 10.0.20348.1) and it sounds like upgrading has indeed helped; glad to hear you're up and running now.
Now it can work normally,But kubelet seems to still have some problems. On the Openshift target page, you can see unauthorized errors reported by related endpoints.
At the same time, the kubelet log has the following error report.
I see the same errors in the "Metrics targets" view, and my kubelet worked for a day or so and is now showing errors like yours. Seems as if certificate rotation might have gone wrong in some way?
Can anyone help me to find what exactly is causing the kubelet exception?
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
Attention
Version
OKD
Cluster Version
4.14.0-0.okd-2023-11-14-101924
Platform
Platform agnostic (type=none)
Proxy
No
WMCO Version
6.0.0
Windows version
2019
What happened?
When deploying WMCO through OperatorHub in the deployed OKD 4.14 cluster, only the WMCO 6.0.0 version is seen. When deploying WMCO in a cluster, the following error is reported when WMCO's Pod starts.
failed to validate required cluster configuration {"error": "error validating k8s version: Unsupported server version: v1.27.1-3351+b49f9d1356bca4-dirty. Supported versions are v1.24.x to v1.25.x", "errorVerbose": "Unsupported server version: v1.27.1-3351+b49f9d1356bca4-dirty. Supported versions are v1.24.x to v1.25.x\ngithub.com/openshift/windows-machine-config-operator/pkg/cluster.(config).validateK8sVersion\n\t/build/windows-machine-config-operator/pkg/cluster/config.go:141\ngithub.com/openshift/windows-machine-config-operator/pkg/cluster.(config).Validate\n\t/build/windows-machine-config-operator/pkg/cluster/config.go:148\nmain.main\n\t/build/windows-machine-config-operator/cmd/operator/main.go:102\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\nerror validating k8s version\ngithub.com/openshift/windows-machine-config-operator/pkg/cluster.(*config).Validate\n\t/build/windows-machine-config-operator/pkg/cluster/config.go:150\nmain.main\n\t/build/windows-machine-config-operator/cmd/operator/main.go:102\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"}
What did you expect to happen?
WMCO can run successfully
Steps to reproduce the issue
Install OKD version 4.14 and deploy WMCO through OperatorHub
Do you have a workaround for this issue?
No response