openshift / installer

Install an OpenShift 4.x cluster
https://try.openshift.com
Apache License 2.0
1.44k stars 1.38k forks source link

Installer skips Version 1 X.509 cert when added to additionalTrustBundle #2484

Closed jcordes73 closed 4 years ago

jcordes73 commented 5 years ago

Version

$ openshift-install version
built from commit 8c6c64d5a1465595f39da968f923141c48a30bfc
release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:02efb41240a70e0eb32224c6bdf33b819c0767294906d03211785b4d7a2b34c7

Platform:

Baremetal

What happened?

Enter text here. Using openshift-install-linux-4.2.0-0.nightly-2019-10-01-210901 and attached install-config.yaml provisioning on bootstrap node seems find (until it is waiting on the master=.

After master being started there is an error in the logs indicated that not the mirror but the original source is being used to pull images:

Oct 09 16:58:59 master.ocp4.ocplabs.com machine-config-daemon[1392]: I1009 16:58:59.422050 1392 run.go:16] Running: podman pull -q --authfile /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9 Oct 09 16:59:01 master.ocp4.ocplabs.com machine-config-daemon[1392]: time="2019-10-09T16:59:01Z" level=error msg="Error pulling image ref //quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9: Error reading manifest sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized" Oct 09 16:59:01 master.ocp4.ocplabs.com machine-config-daemon[1392]: Error: error pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9": unable to pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9: unable to pull image: Error initializing source docker://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9: Error reading manifest sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized Oct 09 16:59:01 master.ocp4.ocplabs.com machine-config-daemon[1392]: W1009 16:59:01.346118 1392 run.go:40] podman failed: exit status 125; retrying...

What you expected to happen?

Bootstrap completes, Master starts correctly

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

CoreOS version rhcos-42.80.20190828.2-metal-uefi

References

Unknown

install-config-air-gapped.yaml.tar.gz

log-bundle-20191009190034.tar.gz

jcordes73 commented 5 years ago

Adding install-log

openshift_install.log.tar.gz

abhinavdahiya commented 5 years ago

can you provide the output of

cat /etc/os-release

from the bootstrap and control-plane hosts? @jcordes73

abhinavdahiya commented 5 years ago

sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9 in quay.io/openshift-release-dev/ocp-v4.0-art-dev: unauthorized: access to the requested resource is not authorized Seems like there is also some authentication failure..

Can you verify

1) the authfile has correct contents on the control-plane host.

ie /var/lib/kubelet/config.json has the pull secret as you would expect to fetch images from 192.168.1.106:5555/ocp4/openshift4

2) You can pull the content from local registry from the control-plane host.

sudo podman pull --authfile /var/lib/kubelet/config.json 192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9
jcordes73 commented 5 years ago

OS Version

cat /etc/os-release (identical on bootstrap and control-plane node) NAME="Red Hat Enterprise Linux CoreOS" VERSION="42.80.20190828.2" VERSION_ID="4.2" PRETTY_NAME="Red Hat Enterprise Linux CoreOS 42.80.20190828.2 (Ootpa)" ID="rhcos" ID_LIKE="rhel fedora" ANSI_COLOR="0;31" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform" REDHAT_BUGZILLA_PRODUCT_VERSION="4.2" REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform" REDHAT_SUPPORT_PRODUCT_VERSION="4.2" OSTREE_VERSION=42.80.20190828.2

Authfile

sudo cat /var/lib/kubelet/config.json {"auths":{"192.168.1.106:5555":{"auth":"YWRtaW46YWRtaW4xMjM=","email":"noemail@localhost"}}}

From install-config.yaml pullSecret: '{"auths":{"192.168.1.106:5555": {"auth": "YWRtaW46YWRtaW4xMjM=","email": "noemail@localhost"}}}'

So this looks identical

Pulling images

On control-plane:

sudo podman pull --authfile /var/lib/kubelet/config.json 192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9 Trying to pull 192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9...ERRO[0000] Error pulling image ref //192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9: Error initializing source docker://192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9: pinging docker registry returned: Get https://192.168.1.106:5555/v2/: x509: certificate is not authorized to sign other certificates Failed Error: error pulling image "192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9": unable to pull 192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9: unable to pull image: Error initializing source docker://192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9: pinging docker registry returned: Get https://192.168.1.106:5555/v2/: x509: certificate is not authorized to sign other certificates

on bootstrap-node:

sudo podman pull --authfile /var/lib/kubelet/config.json 192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9 Trying to pull 192.168.1.106:5555/ocp4/openshift4@sha256:e68a6bfd900b83c947195bc1a82ac467f139edf567c8492a1170275539f9b9f9...Getting image source signatures Copying blob 7dc19ca4b0d7 done Copying config c37a96e3aa done Writing manifest to image destination Storing signatures c37a96e3aa1ddba10c1da3065bed064cb7afd51153fb61b0830dd47e4ce712a9

Interestingly /var/lib/kubelet/config.json doesn't exist on the bootstrap-node.

abhinavdahiya commented 5 years ago

Get https://192.168.1.106:5555/v2/: x509: certificate is not authorized to sign other certificates Failed

hmm, the CA is updated on the node by a service.. Can you check? systemctl status coreos-update-ca-trust.service or journalctl -u coreos-update-ca-trust.service

and the contents of /etc/pki/ca-trust/source/anchors/

wking commented 5 years ago

@abhinavdahiya tracked this down to a Version 1 X.509 cert being silently filtered, because keyUsage and Is CA were added in Version 3 (e.g. see here) and filtering here. So we should be allowing those v1 CAs through, and possibly warning on them, and maybe rejecting only non-CA v3 certs in verification. Or something like that.

mbach04 commented 4 years ago

Is there a workaround for this in the mean time?