Closed KvnOnWeb closed 1 year ago
I run the latest version of OKD (4.11.0-0.okd-2022-12-02-145640) on proxmox v7.3.3 with no problem
My setup is almost the same as yours (3 masters + 3 workers + 1 bootstrap for installation), the difference are that my DNS server is running on the Proxmox host itself (bind9) and I have a dedicated VM for load balancing in front of OKD with a minimal install of AlmaLinux 9.1 + simple install of nginx, for LB/routing of API + apps to OKD + iPXE stuffs when installing
It could be that the problem is coming from you LB (haproxy) not correctly routing the flows to OKD
Q:
We're gonna need an archive produced by must-gather tool, not its output
@titou10titou10 Thanks.
Yes i check many times. After each bootstrap complete, comment bootstrap on haproxy and stop bootstrap VM. My home lab works fine with one proxmox server.
I have 6 proxmox servers with private mesh VPN (Wireguard). Each server on 192.168.1X.1/24 network.
Server 1 : 192.168.11.1 Server 2 : 192.168.12.1 Server X : 192.168.1X.1
Each server have dnsmasq (config pxe server & attribute VM IP by mac address & configure DNS server). Example dnsmasq config :
dhcp-range=192.168.11.10,192.168.11.250,12h
dhcp-lease-max=25
dhcp-host=D2:8E:FF:B3:01:73,192.168.11.11,okd-master-1
dhcp-host=12:8E:DE:B3:01:60,192.168.11.100,okd-services
dhcp-option=option:dns-server,192.168.11.100,1.1.1.1
dhcp-boot=pxelinux.0,,192.168.11.100
I have a problem on authentication operators (503) and console not contact oauth (timeout). Very similar : https://github.com/okd-project/okd/issues/430
My bad. The archive : must-gather.local.1194632374733407674.tar.gz
Try with networkType: OVNKubernetes. Same result.
must-gather archive : must-gather.local.7911081472470111102.tar.gz
Get \"https://oauth-openshift.apps.cloud.soyouweb.fr/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\nOAuthServerServiceEndpointAccessibleControllerDegraded: Get \"https://172.30.138.239:443/healthz\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
and
"WellKnownReadyControllerProgressing: kube-apiserver oauth endpoint https://192.168.12.12:6443/.well-known/oauth-authorization-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance)"
in authentication operator, as kube-apiserver never rolled out:
"NodeInstallerProgressing: 1 nodes are at revision 0; 1 nodes are at revision 3; 1 nodes are at revision 8"
installer pods cannot be created:
"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-8-okd-master-1_openshift-kube-apiserver_62b703a9-0b89-4801-92dc-a6af80a6b611_0(c09814543ff0b731abf40775238958a55b55b10d5c1059d5b47b070163b5d453): error adding pod openshift-kube-apiserver_installer-8-okd-master-1 to CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (add): [openshift-kube-apiserver/installer-8-okd-master-1/62b703a9-0b89-4801-92dc-a6af80a6b611:openshift-sdn]: error adding container to network \"openshift-sdn\": CNI request failed with status 400: 'the server was unable to return a response in the time allotted, but may still be processing the request (get pods installer-8-okd-master-1)\n'"
That looks like a weird networking issue.
Also, must-gather is incomplete, seems its using some user cert and didn't fetch a lot of data from openshift-*
namespaces. Could you check if that works using installer-generated kubeconfig? Also try via export KUBECONFIG=/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost.kubeconfig
on masters
Thanks. I think too but i found not the problem.
The must-gather could be complete : https://drive.google.com/file/d/1LMTX6oV4Mz7n5h1rQ5RSIog8G4ccNP_Q/view?usp=share_link
I have 6 proxmox servers with private mesh VPN (Wireguard). Each server on 192.168.1X.1/24 network.
Are you sure the private mesh VPN between the hypervisors isn't interfering here? I'm not 100% sure, but the must-gather does indicate to me some kind of peculiar networking issue outside of the cluster.
I installed new cluster in single proxmox server and it works. I abandon installation on multiple proxmox with private mesh networks. Thanks for your help.
Describe the bug Hi ! I have a problem when boostrap complete : Console and authentication not Ready. I have 503 call : "APIServicesAvailable: "oauth.openshift.io.v1" is not ready: an attempt failed with statusCode = 503, err = the server is currently unable to handle the request..."
I have six server proxmox with one "okd-services" contains : DNS server, haproxy, pxe server. And 3 master / 3 worker. Domains calls between nodes / master works (ping test). When i try to access to console : I access on cluster but with the not available screen (same as route not exist in cluster).
Note : I have a home lab proxmox with one server (with 6 VM : 3 master / 3 worker) and it works.
Install configuration :
Version 4.11.0-0.okd-2022-12-02-145640 UPI / Platform none
How reproducible 100%
Log bundle
must-gather-okd-20221212.txt