Closed Javatar81 closed 2 months ago
Heads up @cluster/ocp4-admin - the "cluster/ocp4" label was applied to this issue.
Very strange, simple error message: no more details.
issue found - networks ocp4-network
and ocp6-odf
are not configured on any host - that's why the vms can't start.
undefine the network from the VM configuration will start the VMs.
I don't know where those networks have to be configured from host point of view, so I can't fix it.
@knumskull thx for the quick look! ocp4-network is a "virtual network" using the ovirt-provider-ovn. not a physicall one. it should be available on all nodes. its use to create a totally private network for all the ocp4... cluster VMs.
Its showing as operational in the cluster: Any ideas on how to revive that?
Good point. Probably ovn is causing some trouble. Will investigate into that direction.
It might be related to a certificate issue
[root@rhev ~]# openssl verify -CAfile /etc/pki/ovirt-engine/apache-ca.pem /etc/pki/ovirt-engine/certs/apache.cer
O = Red Hat, OU = prod, CN = 2023 Certificate Authority RHCSv2
error 2 at 1 depth lookup: unable to get issuer certificate
error /etc/pki/ovirt-engine/certs/apache.cer: verification failed
Are all CA and intermediate CA included in /etc/pki/ovirt-engine/apache-ca.pem
?
Ah! Good catch! By bad - we got new certs earlier this year, which are create by a new Red Hat Internal CA certificate. Looks like I added the wrong certs to the apache-ca file, that is not the cert chain, but the server cert itself:
openssl x509 -in apache-ca.pem -noout -dates -subject
notBefore=Jan 2 16:42:32 2024 GMT
notAfter=Dec 27 16:42:32 2024 GMT
subject=O = Red Hat, OU = SolutionArchitectsDach, CN = *.stormshift.coe.muc.redhat.com
I dropped the correct root ca chain to /root/2023CertificateAuthorityRHCSv2_Chain.pem
on the rhev host. With that, the verify checks out:
[root@rhev ovirt-engine]# openssl verify -CAfile /root/2023CertificateAuthorityRHCSv2_Chain.pem /etc/pki/ovirt-engine/certs/apache.cer
/etc/pki/ovirt-engine/certs/apache.cer: OK
Can I simply replace the apache-ca.pem or does this require a special procedure?
It might be sufficient to replace the apache-ca with a following restart of engine and ovirt-provider-ovn.
But I can check in a couple of minutes again and follow up.
I replaced the ca-certificate the system in question is up and running again.
Due to other changes, I had to re-create the OVN networks and they show now MTU1500 in UI. This is only a display issue in the UI. They're operating at 1442 inside the VMs.
Thx! I started the remaining VMs, too. They are all up and running now.
@Javatar81 , the cluster is now suffering from expired certs. I did approve all pending CSR, cluster should recover now. Please check again in an hour or so. This issue is resolved, I am closing it.
Thanks all for your great support. Had to approve still some missing CSRs but now the cluster is healthy and I will update soon once the cluster fully recovered
Failed to start VM from RHEV, e.g. ocp4bastion but also other nodes do not start.