Closed psy-q closed 4 months ago
Extracting the castore.pem CA certificate from each individual ESXi host and adding it to the installing machine's trust store allows the installation to continue.
Yes, this step is documented in https://docs.okd.io/4.14/installing/installing_vsphere/installing-vsphere-installer-provisioned-customizations.html#installation-adding-vcenter-root-certificates_installing-vsphere-installer-provisioned-customizations
I think there's a misunderstanding, as I've mentioned we did extract the vSphere CA certificate and the failure is not because of that. It's because of the individual ESXi hosts not being trusted. That's a different CA certificate that's not part of the bundle vSphere offers for download.
The issuer there is "VMware Installer" and the only way to get to those CA certs seems to be to SSH into each ESXi host individually and get it from /etc/vmware/ssl
.
Oh, hmm, ESXi hosts certs are not signed by vSphere CA bundle?
@psy-q your machine certificates are out of sync. The CA gets updated, the ESXi hosts certificate doesn't use the updated CA. You need to renew each ESXi host cert.
Thanks, we will give that a go, remove the individual ESXi CAs from the installer machine's trust store and retry. If it keeps working then that was the problem, I'll report as soon as we know.
Thanks, we will give that a go, remove the individual ESXi CAs from the installer machine's trust store and retry. If it keeps working then that was the problem, I'll report as soon as we know.
You should only be trusting the file download from vCenter's welcome page - "Download trusted root CA certificates" This is the first time I have ever heard someone taking certificates from ESXi hosts and importing them.
I was surprised as well, I've done a few (OCP, though) installs on vSphere and never had to do this before.
I was told that this particular vSphere infrastructure was set up using thumbprint mode (vpxd.certmgmt.mode=thumbprint
) and it can't be changed to proper certificate mode. So I won't be able to test whether switching to VMCA helps. But I'm pretty confident that if VMCA is enabled from the start, you'd never encounter the problem.
Maybe some people running in thumbprint mode and stumbling upon this issue find the information useful.
I'm not sure how the cloud controller works for vSphere but I presume we will have to get it to trust each ESXi host's CA as well, unless it exclusively communicates with vCenter (we have that bit covered already).
vpxd.certmgmt.mode=thumbprint
but wouldn't you have the public key of the root ca to add to the trust?
As this is OKD there is another solution cough cough its only a single line change to ignore the certificates: https://github.com/openshift/installer/blob/d8a8d2b5969701413d532d12371439a0d63033ce/data/data/vsphere/pre-bootstrap/main.tf#L17
Then just rebuild the installer for the release you are installing. I would also suppose you would need to set OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=
to the release image.
Otherwise for OCP this would need to be an RFE.
documentation regarding using your own CA and ESXi host certificates (as previously mentioned vpxd.certmgmt.mode=thumbprint) https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.security.doc/GUID-122A4236-9696-4E1F-B9E8-738855946A93.html
I'm not sure how the cloud controller works for vSphere but I presume we will have to get it to trust each ESXi host's CA as well, unless it exclusively communicates with vCenter (we have that bit covered already).
The other components do not check the certificate.
vpxd.certmgmt.mode=thumbprint
but wouldn't you have the public key of the root ca to add to the trust?
So far what I see is that there are multiple CAs in use:
/certs/download.zip
/etc/vmware/ssl/castore.pem
on each host.I'm clueless about VMware but I'd presume that with VMCA instead of thumbprint, the vCenter CA would reissue certs for each of the ESXi hosts and thus trusting that CA from the OKD installer host would be enough to also trust all the ESXis.
As this is OKD there is another solution cough cough its only a single line change to ignore the certificates: https://github.com/openshift/installer/blob/d8a8d2b5969701413d532d12371439a0d63033ce/data/data/vsphere/pre-bootstrap/main.tf#L17
Oh, okay, that's adventurous :grin: I hope we can get vCenter set up properly instead of having to go this route, but it's good to know we could try it.
documentation regarding using your own CA and ESXi host certificates (as previously mentioned vpxd.certmgmt.mode=thumbprint) https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.security.doc/GUID-122A4236-9696-4E1F-B9E8-738855946A93.html
Thank you, I had forwarded that earlier but was told that it will probably be too much effort to switch. We don't even need to use our own CA, VMCA with its own CA would be fine. From what I'm reading it seems thumbprint mode is only for experiments and debugging, not for production use, so if we could get away from it that would be great and it would simplify OKD installation a lot.
Describe the bug The installer terminates after an initial Terraform stage with the error:
If the VMware Installer CA isn't in the trust store of the machine running the installer. Extracting the
castore.pem
CA certificate from each individual ESXi host and adding it to the installing machine's trust store allows the installation to continue.Our setup seems typical with ESXi hosts showing up fine in vSphere and the installer host is trusting vSphere's CA. Why is it necessary to also trust each individual ESXi host's VMware Installer CA, or in other words why does
openshift-install
need to upload disk images there via HTTP POST?It seems there's no option in the installer to trust unknown authorities/disable certificate validation for the vSphere/VMware components during installation.
Version 4.14.0-0.okd-2024-01-26-175629 IPI on vSphere
How reproducible 100%
Log bundle gather-logs stage is never reached, therefore there's no log bundle.