praveenkumar commented 5 years ago

Version

$ openshift-install version
bin/openshift-install v0.10.0

Platform (aws|libvirt|openstack):

libvirt

What happened?

Installer is failed to get the console route and exit after 10 mins due to context deadline exceeded.

What you expected to happen?

Installer should able to create the cluster without any issue.

How to reproduce it (as minimally and precisely as possible)?

$ env TF_VAR_libvirt_master_memory=8192 TF_VAR_libvirt_master_vcpu=4 ./bin/openshift-install create cluster --dir=test --log-level debug
[...]
INFO Waiting up to 10m0s for the openshift-console route to be created... 
DEBUG Still waiting for the console route...       
DEBUG Still waiting for the console route...
[...]
DEBUG Still waiting for the console route...       
FATAL waiting for openshift-console URL: context deadline exceeded 

$ export KUBECONFIG=/home/prkumar/work/github/practice/go/src/github.com/openshift/installer/test/auth/kubeconfig

$ kubectl get pods --all-namespaces
NAMESPACE                                    NAME                                                         READY     STATUS             RESTARTS   AGE
kube-system                                  etcd-member-crcont-master-0                                  1/1       Running            0          17m
openshift-apiserver-operator                 openshift-apiserver-operator-7fc9bc59d9-wpbw6                1/1       Running            0          16m
openshift-apiserver                          apiserver-j6g59                                              1/1       Running            2          10m
openshift-cluster-api                        cluster-autoscaler-operator-7f74bdf7f9-26tsp                 1/1       Running            0          12m
openshift-cluster-api                        clusterapi-manager-controllers-db4fbd5fc-f7x6x               2/4       ImagePullBackOff   0          11m

$ kubectl logs clusterapi-manager-controllers-db4fbd5fc-f7x6x -n openshift-cluster-api -c nodelink-controller | less
[...]
W0116 08:26:01.170879       1 main.go:379] no matching machine found for node
I0116 08:26:01.170935       1 main.go:312] finished syncing node, duration: 207.751µs
I0116 08:26:01.170988       1 main.go:296] Error syncing node crcont-master-0: no matching machine found for node: crcont-master-0
I0116 08:26:04.914170       1 main.go:147] updating node: crcont-master-0
I0116 08:26:04.914282       1 main.go:310] syncing node
I0116 08:26:04.914334       1 main.go:368] searching machine cache for IP match for node
W0116 08:26:04.914373       1 main.go:379] no matching machine found for node
I0116 08:26:04.914410       1 main.go:312] finished syncing node, duration: 129.151µs
I0116 08:26:04.914447       1 main.go:296] Error syncing node crcont-master-0: no matching machine found for node: crcont-master-0

$ kubectl logs clusterapi-manager-controllers-db4fbd5fc-f7x6x -n openshift-cluster-api -c machine-healthcheck
[...]
I0116 08:25:58.615279       1 machinehealthcheck_controller.go:73] Reconciling MachineHealthCheck triggered by /crcont-master-0
W0116 08:25:58.615409       1 machinehealthcheck_controller.go:92] No machine annotation for node crcont-master-0
I0116 08:26:08.629294       1 machinehealthcheck_controller.go:73] Reconciling MachineHealthCheck triggered by /crcont-master-0
W0116 08:26:08.629354       1 machinehealthcheck_controller.go:92] No machine annotation for node crcont-master-0
I0116 08:26:18.649707       1 machinehealthcheck_controller.go:73] Reconciling MachineHealthCheck triggered by /crcont-master-0
W0116 08:26:18.649756       1 machinehealthcheck_controller.go:92] No machine annotation for node crcont-master-0

Anything else we need to know?

Looks like the issue with clusterapi-manager-controllers-db4fbd5fc-f7x6x pod since it is in ImagePullBackoff state and logs shows that it is not able to identify the master node.

References

enter text here.

johwes commented 5 years ago

I have the same issue, it fails to pull the image doing an "oc describe pod " shows the following.

Normal Scheduled 46m default-scheduler Successfully assigned openshift-cluster-api/clusterapi-manager-controllers-db4fbd5fc-bmlhw to ocp-master-0 Normal Pulled 46m kubelet, ocp-master-0 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d1b7554af827639b86372f1d9c0fbdbe609c878381f40c433a1f788ec0fe5a7" already present on machine Normal Started 46m kubelet, ocp-master-0 Started container Normal Created 46m kubelet, ocp-master-0 Created container Normal Pulled 46m kubelet, ocp-master-0 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3d1b7554af827639b86372f1d9c0fbdbe609c878381f40c433a1f788ec0fe5a7" already present on machine Normal Started 46m kubelet, ocp-master-0 Started container Normal Created 46m kubelet, ocp-master-0 Created container Normal Pulling 46m (x2 over 46m) kubelet, ocp-master-0 pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225" Warning Failed 46m (x4 over 46m) kubelet, ocp-master-0 Error: ImagePullBackOff Normal BackOff 46m (x4 over 46m) kubelet, ocp-master-0 Back-off pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225" Warning Failed 46m (x2 over 46m) kubelet, ocp-master-0 Error: ErrImagePull Warning Failed 46m (x2 over 46m) kubelet, ocp-master-0 Failed to pull image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225": rpc error: code = Unknown desc = Error reading manifest sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225 in registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905: unauthorized: authentication required Normal BackOff 36m (x39 over 46m) kubelet, ocp-master-0 Back-off pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225" Warning Failed 1m (x188 over 46m) kubelet, ocp-master-0 Error: ImagePullBackOff

praveenkumar commented 5 years ago

I tried today with master branch and didn't see this issue but then again with 0.10.0 tag it is occurring so might be something to do with the way we tag the payload for this tag?

johwes commented 5 years ago

I also rebuilt from 0.9.1 tag and got it working with that.

crawford commented 5 years ago

0.10.0 is a special release. It is the beta1 build, which means that it targets a different set of content than 0.9.1. I also noticed that the libvirt container isn't pushed to quay (unlike its AWS counterpart), so I think it was just missed in the release process.

praveenkumar commented 5 years ago

I also noticed that the libvirt container isn't pushed to quay (unlike its AWS counterpart), so I think it was just missed in the release process.

@crawford thanks, that explain why it is happening with only libvirt. Do we have any plan to push those missing libvirt containers to quay registry?

bbrowning commented 5 years ago

I just wanted to add that my team is hitting this issue as well and are stuck on 0.9.1 for now until we find a way to run 0.10.x locally with libvirt.

e-minguez commented 5 years ago

Same issue with 0.10.1, console is not deployed because there are no workers available... because the clusterapi-manager-controllers is not up... because it is trying to pull the image from an internal registry which I cannot access:

5m          5m           2         clusterapi-manager-controllers-db4fbd5fc-nhqnb.157c848781ca4d28   Pod          spec.containers{controller-manager}            Warning   Failed              kubelet, minwi-master-0                                                             Failed to pull image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225": rpc error: code = Unknown desc = Error reading manifest sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225 in registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905: unauthorized: authentication required

wking commented 5 years ago

This is a release issue, the installer just pins the update payloads the release folks push to quay.io. It's being tracked here.

ghost commented 5 years ago

@wking any update on this one? I am using the latest master and have exactly the same issue

smarterclayton commented 5 years ago

You must have a pull secret to api.ci in order to access libvirt, because the installer team has chosen not to build libvirt for OCP.

ghost commented 5 years ago

@smarterclayton thanks. How do I get one?

smarterclayton commented 5 years ago

If you're not in the openshift GitHub organization, you can't get one.

libvirt isn't supported in the official installer. You need to use the origin variant or not use libvirt.

ghost commented 5 years ago

@smarterclayton what's Origin variant? Flavor isn't that important to me. I need a local 4.0 cluster :)

smarterclayton commented 5 years ago

git clone openshift/installer, run hack/build-go.sh, and that's origin

wking commented 5 years ago

git clone openshift/installer, run hack/build-go.sh, and that's origin

For libvirt, you need to set TAGS=libvirt when building.

wking commented 5 years ago

Flavor isn't that important to me.

Unless you take steps to preserve the public (I think?) OKD builds at registry.svc.ci.openshift.org/openshift/origin-release, they're going to get garbage-collected after a few days. Master installer builds (currently the only way to get libvirt compiled in) point there by default, so your cluster should run fine for a few days and then probably start to die as the backing images get garbage-collected. Should be fine for dev-work (the libvirt target), but it's not going to work for long-running tasks out of the box.

ghost commented 5 years ago

@smarterclayton @wking but this is exactly what I am doing:

eugene@ivantsoft ~/go/src/github.com/openshift/installer ((HEAD detached at v0.10.0)) $ TAGS=libvirt hack/build.sh
+ RELEASE_IMAGE=quay.io/openshift-release-dev/ocp-release:4.0.0-0.1
+ RHCOS_BUILD_NAME=47.249
+ minimum_go_version=1.10
++ go version
++ cut -d ' ' -f 3
+ current_go_version=go1.10.3
++ version 1.10.3
++ IFS=.
++ printf '%03d%03d%03d\n' 1 10 3
++ unset IFS
++ version 1.10
++ IFS=.
++ printf '%03d%03d%03d\n' 1 10
++ unset IFS
+ '[' 001010003 -lt 001010000 ']'
+ LAUNCH_PATH=/home/eugene/go/src/github.com/openshift/installer
++ dirname hack/build.sh
+ cd hack/..
++ go list -e -f '{{.Dir}}' github.com/openshift/installer
+ PACKAGE_PATH=/home/eugene/go/src/github.com/openshift/installer
+ test -z /home/eugene/go/src/github.com/openshift/installer
+ LOCAL_PATH=/home/eugene/go/src/github.com/openshift/installer
+ test /home/eugene/go/src/github.com/openshift/installer '!=' /home/eugene/go/src/github.com/openshift/installer
+ MODE=release
++ git describe --always --abbrev=40 --dirty
+ LDFLAGS=' -X main.version=v0.10.0'
+ TAGS=libvirt
+ OUTPUT=bin/openshift-install
+ export CGO_ENABLED=0
+ CGO_ENABLED=0
+ case "${MODE}" in
+ TAGS='libvirt release'
+ test -n quay.io/openshift-release-dev/ocp-release:4.0.0-0.1
+ LDFLAGS=' -X main.version=v0.10.0 -X github.com/openshift/installer/pkg/asset/ignition/bootstrap.defaultReleaseImage=quay.io/openshift-release-dev/ocp-release:4.0.0-0.1'
+ test -n 47.249
+ LDFLAGS=' -X main.version=v0.10.0 -X github.com/openshift/installer/pkg/asset/ignition/bootstrap.defaultReleaseImage=quay.io/openshift-release-dev/ocp-release:4.0.0-0.1 -X github.com/openshift/installer/pkg/rhcos.buildName=47.249'
+ test '' '!=' y
+ go generate ./data
writing assets_vfsdata.go
+ echo 'libvirt release'
+ grep -q libvirt
+ export CGO_ENABLED=1
+ CGO_ENABLED=1
+ go build -ldflags ' -X main.version=v0.10.0 -X github.com/openshift/installer/pkg/asset/ignition/bootstrap.defaultReleaseImage=quay.io/openshift-release-dev/ocp-release:4.0.0-0.1 -X github.com/openshift/installer/pkg/rhcos.buildName=47.249' -tags 'libvirt release' -o bin/openshift-install ./cmd/openshift-install
eugene@ivantsoft ~/go/src/github.com/openshift/installer ((HEAD detached at v0.10.0)) $ bin/openshift-install create cluster
? SSH Public Key /home/eugene/.ssh/id_rsa.pub
? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain tt.testing
? Cluster Name codenvy
? Pull Secret [? for help] ****************************************************************************************************************************************************************************************INFO Fetching OS image: redhat-coreos-maipo-47.249-qemu.qcow2.gz 
INFO Creating cluster...                          
INFO Waiting up to 30m0s for the Kubernetes API... 
INFO API v1.11.0+c69f926354 up                    
INFO Waiting up to 30m0s for the bootstrap-complete event... 
ERROR: logging before flag.Parse: E0130 08:28:24.401392    6213 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 148 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 10m0s for the openshift-console route to be created... 
FATAL waiting for openshift-console URL: context deadline exceeded

Events: events.log

Is this expected with 0.10.0? Since it works with 0.9.1 for me on Fedora 28 with libvirt. However, unfortunately, 0.9.1 has a bug (already fixed https://github.com/openshift/console/issues/1112) and this version does not work for me since I need to work with OperatorHub and integration of Eclipse Che operator.

What would be the best way to proceed? Give up on a local install with libvirt and look for AWS resources?

praveenkumar commented 5 years ago

@eivantsov https://bugzilla.redhat.com/show_bug.cgi?id=1666561this is where it is tracked, do put your comments.

wking commented 5 years ago

...(HEAD detached at v0.10.0)...

Building from tagged releases get update payloads from quay.io, see the Bugzilla bug linked above (twice now ;). Building from master should work better, but comes wiith its own caveats.

What would be the best way to proceed? Give up on a local install with libvirt and look for AWS resources?

We will sell AWS support, so yeah, I expect that is the best route if you want fewer quirks at this stage.

ghost commented 5 years ago

@praveenkumar i don't have access to this issue

@wking would it be fair to say that 0.10.0+ libvirt installation is broken now?

praveenkumar commented 5 years ago

i don't have access to this issue

@eivantsov if you login using your redhat account then you will able to access this atm.

would it be fair to say that 0.10.0+ libvirt installation is broken now?

@eivantsov this only broken for tagged release which have released payloads but it does work with from master as @wking said.

ghost commented 5 years ago

@praveenkumar I have the same problem with master too

And I am logged in with my RH email

ghost commented 5 years ago

@wking @praveenkumar From master, libvirt, Fedora 28

eugene@ivantsoft ~/go/src/github.com/openshift/fourdotoh $ ../installer/bin/openshift-install create cluster
? SSH Public Key /home/eugene/.ssh/id_rsa.pub
? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain tt.testing
? Cluster Name eugenious
? Pull Secret [? for help] ***********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
INFO Fetching OS image: redhat-coreos-maipo-47.287-qemu.qcow2.gz 
INFO Creating cluster...                          
INFO Waiting up to 30m0s for the Kubernetes API... 
INFO API v1.12.4+3434dda up                       
INFO Waiting up to 30m0s for the bootstrap-complete event... 
ERROR: logging before flag.Parse: E0130 11:06:07.008600   14435 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug=""
WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 2318 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 10m0s for the openshift-console route to be created... 
FATAL waiting for openshift-console URL: context deadline exceeded

Last lines from install log:

time="2019-01-30T11:12:45+02:00" level=debug msg="Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)"
time="2019-01-30T11:13:15+02:00" level=debug msg="Still waiting for the console route..."
time="2019-01-30T11:14:04+02:00" level=debug msg="Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)"
time="2019-01-30T11:14:34+02:00" level=debug msg="Still waiting for the console route: the server is currently unable to handle the request (get routes.route.openshift.io)"
time="2019-01-30T11:15:04+02:00" level=debug msg="Still waiting for the console route: the server is currently unable to handle the request (get routes.route.openshift.io)"
time="2019-01-30T11:15:44+02:00" level=debug msg="Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)"
time="2019-01-30T11:16:15+02:00" level=debug msg="Still waiting for the console route..."
time="2019-01-30T11:16:45+02:00" level=debug msg="Still waiting for the console route..."
time="2019-01-30T11:17:22+02:00" level=debug msg="Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)"
time="2019-01-30T11:17:52+02:00" level=debug msg="Still waiting for the console route..."
time="2019-01-30T11:18:07+02:00" level=fatal msg="waiting for openshift-console URL: context deadline exceeded"

There are a couple of failed pods with connection refused errors:

oc logs openshift-kube-apiserver-operator-5689d5dd48-nbnq5 -n=openshift-kube-apiserver-operator

I0130 09:22:49.259690       1 event.go:221] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-kube-apiserver-operator", Name:"openshift-kube-apiserver-operator", UID:"9b2e14ef-246d-11e9-bcb2-664f163f5f0f", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'InstallerPodFailed' Failed to create installer pod for revision 7 on node "eugenious-master-0": Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-7-eugenious-master-0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.267089       1 installer_controller.go:636] key failed with : Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods/installer-7-eugenious-master-0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.268766       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Role: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-kube-apiserver/roles?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.287194       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Image: Get https://172.30.0.1:443/apis/config.openshift.io/v1/images?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.291247       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Authentication: Get https://172.30.0.1:443/apis/config.openshift.io/v1/authentications?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.294226       1 reflector.go:134] github.com/openshift/cluster-kube-apiserver-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1alpha1.KubeAPIServerOperatorConfig: Get https://172.30.0.1:443/apis/kubeapiserver.operator.openshift.io/v1alpha1/kubeapiserveroperatorconfigs?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.296771       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ClusterRoleBinding: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.298489       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.RoleBinding: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-kube-apiserver/rolebindings?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.662625       1 resourcesync_controller.go:233] key failed with : Put https://172.30.0.1:443/apis/kubeapiserver.operator.openshift.io/v1alpha1/kubeapiserveroperatorconfigs/instance/status: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.737589       1 leaderelection.go:270] error retrieving resource lock openshift-kube-apiserver-operator/openshift-cluster-kube-apiserver-operator-lock: Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver-operator/configmaps/openshift-cluster-kube-apiserver-operator-lock: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:49.864810       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Secret: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/secrets?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.059761       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Secret: Get https://172.30.0.1:443/api/v1/namespaces/openshift-config-managed/secrets?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.260944       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Secret: Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/secrets?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.270199       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Role: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-kube-apiserver/roles?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.290580       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Image: Get https://172.30.0.1:443/apis/config.openshift.io/v1/images?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.291841       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Authentication: Get https://172.30.0.1:443/apis/config.openshift.io/v1/authentications?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.297195       1 reflector.go:134] github.com/openshift/cluster-kube-apiserver-operator/pkg/generated/informers/externalversions/factory.go:101: Failed to list *v1alpha1.KubeAPIServerOperatorConfig: Get https://172.30.0.1:443/apis/kubeapiserver.operator.openshift.io/v1alpha1/kubeapiserveroperatorconfigs?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.299254       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ClusterRoleBinding: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.300194       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.RoleBinding: Get https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-kube-apiserver/rolebindings?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:50.463036       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ConfigMap: Get https://172.30.0.1:443/api/v1/namespaces/kube-system/configmaps?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0130 09:22:57.736504       1 event.go:259] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' '8406e42e-2470-11e9-9a93-0a580a800003 stopped leading'
I0130 09:22:57.736563       1 leaderelection.go:249] failed to renew lease openshift-kube-apiserver-operator/openshift-cluster-kube-apiserver-operator-lock: failed to tryAcquireOrRenew context deadline exceeded
F0130 09:22:57.736585       1 leaderelection.go:65] leaderelection lost

oc logs openshift-kube-scheduler-eugenious-master-0 -n=openshift-kube-scheduler

130 09:28:55.539845       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.StatefulSet: Get https://eugenious-api.tt.testing:6443/apis/apps/v1/statefulsets?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:55.563234       1 reflector.go:169] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:131
E0130 09:28:55.563941       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.PersistentVolume: Get https://eugenious-api.tt.testing:6443/api/v1/persistentvolumes?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:55.567032       1 reflector.go:169] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/client-go/informers/factory.go:131
E0130 09:28:55.567702       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1beta1.PodDisruptionBudget: Get https://eugenious-api.tt.testing:6443/apis/policy/v1beta1/poddisruptionbudgets?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:55.574635       1 reflector.go:169] Listing and watching *v1.ReplicationController from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.585226       1 reflector.go:169] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:131
E0130 09:28:55.596884       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ReplicationController: Get https://eugenious-api.tt.testing:6443/api/v1/replicationcontrollers?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.597899       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.PersistentVolumeClaim: Get https://eugenious-api.tt.testing:6443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:55.598112       1 reflector.go:169] Listing and watching *v1.ReplicaSet from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.599863       1 reflector.go:169] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.600301       1 reflector.go:169] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.602682       1 reflector.go:169] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:131
I0130 09:28:55.609265       1 reflector.go:169] Listing and watching *v1.Pod from k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:178
E0130 09:28:55.610041       1 reflector.go:134] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:178: Failed to list *v1.Pod: Get https://eugenious-api.tt.testing:6443/api/v1/pods?fieldSelector=status.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.610211       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Node: Get https://eugenious-api.tt.testing:6443/api/v1/nodes?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.610363       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.StorageClass: Get https://eugenious-api.tt.testing:6443/apis/storage.k8s.io/v1/storageclasses?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.610724       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ReplicaSet: Get https://eugenious-api.tt.testing:6443/apis/apps/v1/replicasets?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
E0130 09:28:55.614051       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.Service: Get https://eugenious-api.tt.testing:6443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.126.11:6443: connect: connection refused
I0130 09:28:56.540950       1 reflector.go:169] Listing and watching *v1.StatefulSet from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.564111       1 reflector.go:169] Listing and watching *v1.PersistentVolume from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.568644       1 reflector.go:169] Listing and watching *v1beta1.PodDisruptionBudget from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.597321       1 reflector.go:169] Listing and watching *v1.ReplicationController from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.608238       1 reflector.go:169] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.611688       1 reflector.go:169] Listing and watching *v1.Pod from k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:178
I0130 09:28:56.614043       1 reflector.go:169] Listing and watching *v1.Node from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.616628       1 reflector.go:169] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.617114       1 reflector.go:169] Listing and watching *v1.ReplicaSet from k8s.io/client-go/informers/factory.go:131
I0130 09:28:56.618578       1 reflector.go:169] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:131
I0130 09:29:02.086276       1 wrap.go:47] GET /metrics: (2.775502ms) 200 [Prometheus/2.6.0 192.168.126.51:48566]
I0130 09:29:02.089486       1 wrap.go:47] GET /metrics: (5.676885ms) 200 [Prometheus/2.6.0 192.168.126.51:49234]
E0130 09:29:03.403148       1 event.go:259] Could not construct reference to: '&v1.Endpoints{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Subsets:[]v1.EndpointSubset(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'eugenious-master-0_d2bc12a7-2470-11e9-936a-52fdfc072182 stopped leading'
I0130 09:29:03.403228       1 leaderelection.go:249] failed to renew lease kube-system/kube-scheduler: failed to tryAcquireOrRenew context deadline exceeded
E0130 09:29:03.403256       1 server.go:207] lost master
lost lease

Events:

install_events.log

e-minguez commented 5 years ago

Current master (commit id d3ff3afe):

Bootstrap node and master are created
Deployment happens
Bootstrap node is deleted
Worker is created
Installation finished unsuccessfully while waiting for the console to be created (FATAL waiting for openshift-console URL: context deadline exceeded)

    $ oc get pods -o wide --all-namespaces
    NAMESPACE                                    NAME                                                       READY     STATUS              RESTARTS   AGE       IP               NODE                   NOMINATED NODE
    kube-system                                  etcd-member-minwi-master-0                                 1/1       Running             0          17m       192.168.126.11   minwi-master-0         <none>
    openshift-apiserver-operator                 openshift-apiserver-operator-74cb6bbbfc-bf877              1/1       Running             1          10m       10.128.0.26      minwi-master-0         <none>
    openshift-apiserver                          apiserver-t78ct                                            1/1       Running             0          3m51s     10.128.0.48      minwi-master-0         <none>
    openshift-authentication-operator            openshift-authentication-operator-7899cdcfd5-9ldbf         1/1       Running             0          6m56s     10.128.0.33      minwi-master-0         <none>
    openshift-authentication-operator            openshift-authentication-operator-7899cdcfd5-kzslq         1/1       Running             0          6m56s     10.128.0.34      minwi-master-0         <none>
    openshift-authentication-operator            openshift-authentication-operator-7899cdcfd5-zdg2r         1/1       Running             0          6m56s     10.128.0.35      minwi-master-0         <none>
    openshift-authentication-operator            origin-cluster-authentication-operator1-77868f4756-dzc6h   1/1       Running             1          6m56s     10.128.0.36      minwi-master-0         <none>
    openshift-cloud-credential-operator          cloud-credential-operator-c8c99b889-gq9cr                  1/1       Running             0          7m5s      10.128.0.32      minwi-master-0         <none>
    openshift-cluster-api                        cluster-autoscaler-operator-59fbb7468d-s2qdc               1/1       Running             0          10m       10.128.0.25      minwi-master-0         <none>
    openshift-cluster-api                        clusterapi-manager-controllers-5764dd8cd7-r5h6v            4/4       Running             0          9m51s     10.128.0.28      minwi-master-0         <none>
    openshift-cluster-api                        machine-api-operator-587656b779-vx9cc                      1/1       Running             0          10m       10.128.0.27      minwi-master-0         <none>
    openshift-cluster-machine-approver           machine-approver-64fbd8bc6c-mqrrr                          1/1       Running             0          18m       192.168.126.11   minwi-master-0         <none>
    openshift-cluster-version                    cluster-version-operator-c4599b87d-27fgp                   1/1       Running             0          18m       192.168.126.11   minwi-master-0         <none>
    openshift-controller-manager-operator        openshift-controller-manager-operator-677b796b6f-g896f     1/1       Running             4          7m56s     10.128.0.30      minwi-master-0         <none>
    openshift-controller-manager                 controller-manager-tqkcc                                   1/1       Running             1          3m23s     10.128.0.51      minwi-master-0         <none>
    openshift-core-operators                     openshift-service-cert-signer-operator-65664df755-tctzz    1/1       Running             0          18m       10.128.0.2       minwi-master-0         <none>
    openshift-dns-operator                       dns-operator-6cddb84ddd-264mp                              1/1       Running             0          18m       10.128.0.3       minwi-master-0         <none>
    openshift-dns                                dns-default-6bzhh                                          2/2       Running             0          8m4s      10.129.0.2       minwi-worker-0-52gn4   <none>
    openshift-dns                                dns-default-g2tpm                                          2/2       Running             0          15m       10.128.0.8       minwi-master-0         <none>
    openshift-image-registry                     cluster-image-registry-operator-5544bb9f48-8flqd           1/1       Running             0          6m55s     10.128.0.37      minwi-master-0         <none>
    openshift-image-registry                     image-registry-77f87b797f-fhvrf                            1/1       Running             0          6m19s     10.129.0.6       minwi-worker-0-52gn4   <none>
    openshift-image-registry                     node-ca-955vc                                              1/1       Running             0          6m11s     10.128.0.40      minwi-master-0         <none>
    openshift-image-registry                     node-ca-kgzl8                                              1/1       Running             0          6m11s     10.129.0.7       minwi-worker-0-52gn4   <none>
    openshift-ingress-operator                   ingress-operator-5ff8c7dfdd-8hkc5                          1/1       Running             0          6m53s     10.128.0.38      minwi-master-0         <none>
    openshift-ingress                            router-default-654ff569fd-qpkjd                            1/1       Running             0          6m32s     192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-kube-apiserver-operator            openshift-kube-apiserver-operator-5689d5dd48-6m9d5         0/1       CrashLoopBackOff    4          18m       10.128.0.4       minwi-master-0         <none>
    openshift-kube-apiserver                     installer-1-minwi-master-0                                 0/1       Completed           0          14m       10.128.0.9       minwi-master-0         <none>
    openshift-kube-apiserver                     installer-2-minwi-master-0                                 0/1       Completed           0          14m       10.128.0.11      minwi-master-0         <none>
    openshift-kube-apiserver                     installer-3-minwi-master-0                                 0/1       Completed           0          6m2s      10.128.0.41      minwi-master-0         <none>
    openshift-kube-apiserver                     installer-4-minwi-master-0                                 0/1       Completed           0          4m44s     10.128.0.46      minwi-master-0         <none>
    openshift-kube-apiserver                     installer-5-minwi-master-0                                 0/1       Completed           0          3m21s     10.128.0.52      minwi-master-0         <none>
    openshift-kube-apiserver                     installer-6-minwi-master-0                                 0/1       Completed           0          100s      10.128.0.56      minwi-master-0         <none>
    openshift-kube-apiserver                     openshift-kube-apiserver-minwi-master-0                    1/1       Running             0          66s       192.168.126.11   minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-0-minwi-master-0                           0/1       Completed           0          14m       10.128.0.10      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-2-minwi-master-0                           0/1       Completed           0          14m       10.128.0.12      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-3-minwi-master-0                           0/1       Completed           0          6m2s      10.128.0.43      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-4-minwi-master-0                           0/1       Completed           0          4m44s     10.128.0.47      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-5-minwi-master-0                           0/1       Completed           0          3m21s     10.128.0.53      minwi-master-0         <none>
    openshift-kube-apiserver                     revision-pruner-6-minwi-master-0                           0/1       Completed           0          100s      10.128.0.57      minwi-master-0         <none>
    openshift-kube-controller-manager-operator   kube-controller-manager-operator-5b8fcd96c8-96v8d          1/1       Running             4          13m       10.128.0.14      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-1-minwi-master-0                                 0/1       Completed           0          12m       10.128.0.16      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-2-minwi-master-0                                 0/1       Completed           0          12m       10.128.0.17      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-3-minwi-master-0                                 0/1       Completed           0          5m4s      10.128.0.44      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-4-minwi-master-0                                 0/1       Completed           0          3m35s     10.128.0.49      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-5-minwi-master-0                                 0/1       Completed           0          2m        10.128.0.54      minwi-master-0         <none>
    openshift-kube-controller-manager            installer-6-minwi-master-0                                 0/1       ContainerCreating   0          6s        <none>           minwi-master-0         <none>
    openshift-kube-controller-manager            openshift-kube-controller-manager-minwi-master-0           1/1       Running             3          109s      192.168.126.11   minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-0-minwi-master-0                           0/1       Completed           0          12m       10.128.0.19      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-2-minwi-master-0                           0/1       Completed           0          12m       10.128.0.18      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-3-minwi-master-0                           0/1       Completed           0          5m3s      10.128.0.45      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-4-minwi-master-0                           0/1       Completed           0          3m36s     10.128.0.50      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-5-minwi-master-0                           0/1       Completed           0          2m        10.128.0.55      minwi-master-0         <none>
    openshift-kube-controller-manager            revision-pruner-6-minwi-master-0                           0/1       ContainerCreating   0          7s        <none>           minwi-master-0         <none>
    openshift-kube-scheduler-operator            openshift-kube-scheduler-operator-85cff5b9bd-2k6v8         1/1       Running             0          13m       10.128.0.13      minwi-master-0         <none>
    openshift-kube-scheduler                     installer-1-minwi-master-0                                 0/1       Completed           0          13m       10.128.0.15      minwi-master-0         <none>
    openshift-kube-scheduler                     openshift-kube-scheduler-minwi-master-0                    1/1       Running             4          12m       192.168.126.11   minwi-master-0         <none>
    openshift-kube-scheduler                     revision-pruner-0-minwi-master-0                           0/1       OOMKilled           0          11m       10.128.0.20      minwi-master-0         <none>
    openshift-machine-config-operator            machine-config-controller-d7c89fcb5-gmzh2                  1/1       Running             0          10m       10.128.0.24      minwi-master-0         <none>
    openshift-machine-config-operator            machine-config-daemon-m8chj                                1/1       Running             0          8m4s      192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-machine-config-operator            machine-config-daemon-mf4jb                                1/1       Running             0          10m       192.168.126.11   minwi-master-0         <none>
    openshift-machine-config-operator            machine-config-operator-6d45f58bbc-92kl2                   1/1       Running             0          11m       10.128.0.21      minwi-master-0         <none>
    openshift-machine-config-operator            machine-config-server-6gr95                                1/1       Running             0          10m       192.168.126.11   minwi-master-0         <none>
    openshift-monitoring                         cluster-monitoring-operator-f576575b5-cxbjq                1/1       Running             0          6m53s     10.128.0.39      minwi-master-0         <none>
    openshift-monitoring                         grafana-78765ddcc7-cr9bq                                   2/2       Running             1          112s      10.129.0.8       minwi-worker-0-52gn4   <none>
    openshift-monitoring                         prometheus-operator-6df5775484-jsrnj                       1/1       Running             2          6m30s     10.129.0.4       minwi-worker-0-52gn4   <none>
    openshift-multus                             multus-5cwqx                                               1/1       Running             4          17m       192.168.126.11   minwi-master-0         <none>
    openshift-multus                             multus-g8dbk                                               1/1       Running             0          8m4s      192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-network-operator                   network-operator-6484475bbd-zv66l                          1/1       Running             0          18m       192.168.126.11   minwi-master-0         <none>
    openshift-operator-lifecycle-manager         catalog-operator-58b8fb9564-8wf28                          1/1       Running             0          11m       10.128.0.23      minwi-master-0         <none>
    openshift-operator-lifecycle-manager         olm-operator-686859c7c9-qppqh                              1/1       Running             0          11m       10.128.0.22      minwi-master-0         <none>
    openshift-operator-lifecycle-manager         olm-operators-5cv8v                                        1/1       Running             0          10m       10.129.0.3       minwi-worker-0-52gn4   <none>
    openshift-operator-lifecycle-manager         packageserver-7c95d754d6-mpwmz                             1/1       Running             2          6m3s      10.128.0.42      minwi-master-0         <none>
    openshift-sdn                                ovs-q65kv                                                  1/1       Running             0          8m4s      192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-sdn                                ovs-x5jfc                                                  1/1       Running             0          17m       192.168.126.11   minwi-master-0         <none>
    openshift-sdn                                sdn-controller-26skg                                       0/1       CrashLoopBackOff    4          17m       192.168.126.11   minwi-master-0         <none>
    openshift-sdn                                sdn-gksbr                                                  1/1       Running             0          8m4s      192.168.126.51   minwi-worker-0-52gn4   <none>
    openshift-sdn                                sdn-xbth8                                                  1/1       Running             1          17m       192.168.126.11   minwi-master-0         <none>
    openshift-service-cert-signer                apiservice-cabundle-injector-5f54f9578b-bcvqb              1/1       Running             0          15m       10.128.0.6       minwi-master-0         <none>
    openshift-service-cert-signer                configmap-cabundle-injector-54cc474585-7hhc7               1/1       Running             0          15m       10.128.0.5       minwi-master-0         <none>
    openshift-service-cert-signer                service-serving-cert-signer-d9987ff6d-57nr6                1/1       Running             0          15m       10.128.0.7       minwi-master-0         <none>

    $ oc get nodes -o wide
    NAME                   STATUS    ROLES     AGE       VERSION              INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION              CONTAINER-RUNTIME
    minwi-master-0         Ready     master    18m       v1.12.4+f39ab668d3   192.168.126.11   <none>        Red Hat CoreOS 4.0   3.10.0-957.5.1.el7.x86_64   cri-o://1.12.5-1.rhaos4.0.git97ebf9b.el7-dev
    minwi-worker-0-52gn4   Ready     worker    8m40s     v1.12.4+f39ab668d3   192.168.126.51   <none>        Red Hat CoreOS 4.0   3.10.0-957.5.1.el7.x86_64   cri-o://1.12.5-1.rhaos4.0.git97ebf9b.el7-dev

    # installer output
    ...
    INFO Waiting up to 10m0s for the openshift-console route to be created...
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server is currently unable to handle the request (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server is currently unable to handle the request (get routes.route.openshift.io)
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route...      
    DEBUG Still waiting for the console route: the server could not find the requested resource (get routes.route.openshift.io)
    FATAL waiting for openshift-console URL: context deadline exceeded

Findings:

Some pods in the master are oomkilled/crashloopback
Some 'the server is currently unable to handle the request' messages happen. I think this is because the master is out of resources
openshift-install still uses a huge amount of ram...

I think the issues can be workarounded by adding more cpu/ram to the master node (so, create the manifests, modify the cpu/ram specs for the masters, and create the cluster), but I will need to find somewhere else to test it, my laptop is not capable of do that.

Just in case, in order to make this 'work' in my laptop (t480s, 16 gb ram) I need to:

Close all apps (except the terminal)
Add 4 gb of swap (fallocate -l 4G /swapfile to create a swapfile... then swapon -a /swapfile...
Also after closing all apps, clean the disk cache to free up some memory (echo 3 > /proc/sys/vm/drop_caches)

ghost commented 5 years ago

@e-minguez i have 24GB and while installing I do not see all of my RAM being used.

e-minguez commented 5 years ago

@e-minguez i have 24GB and while installing I do not see all of my RAM being used.

I do have 16 and unless I add 4 gb more of swap (goodbye nvme!) the installer is oom-killed :)

rhopp commented 5 years ago

It seems, that I stumbled upon the same issue using 0.11.0 version and AWS as the target infrastructure. Here's the script output

INFO Creating cluster... INFO Waiting up to 30m0s for the Kubernetes API... INFO API v1.11.0+8868a98a7b up INFO Waiting up to 30m0s for the bootstrap-complete event... ERROR: logging before flag.Parse: E0130 12:44:05.888016 25969 streamwatcher.go:109] Unable to decode an event from the watch stream: http2: server sent GOAWAY and closed the connection; LastStreamID=3, ErrCode=NO_ERROR, debug="" WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 297 INFO Destroying the bootstrap resources... INFO Waiting up to 10m0s for the openshift-console route to be created... FATAL waiting for openshift-console URL: context deadline exceeded

Here's the .openshift_install.log: https://paste.fedoraproject.org/paste/zhe4v4FPxfel9vm01Nvvjw

jkremser commented 5 years ago

I am in the completely same situation as @eivantsov . I need to test the operator in the marketplace and I am also hitting this issue in clusterapi-manager-controllers pod. No console for me :(

Failed to pull image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225": rpc error: code = Unknown desc = Error reading manifest sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225 in registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-15-010905: unauthorized: authentication required

and I can confirm that it's almost impossible to run the installer on t480s with 16gigs of RAM. 0.9.1 worked, but marketplace was broken there. Isn't it possible to run 0.9.1 version of installer and upgrade only the console in it? The whole cluster comes with the console operator, can't I just modify the crd for the console to use the version w/ the fix?

praveenkumar commented 5 years ago

@wking Since now this payload is available on quay.io, and according to https://bugzilla.redhat.com/show_bug.cgi?id=166656 it should be available with new tag of installer, so when we can do a new tag release for installer and have all payload from quay.io side?

cynepco3hahue commented 5 years ago

I still can see that issue on the master

  Warning  Failed     3m (x2 over 4m)  kubelet, test-1-master-0  Error: ErrImagePull
  Warning  Failed     3m (x2 over 4m)  kubelet, test-1-master-0  Failed to pull image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-25-205123@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225": rpc error: code = Unknown desc = Error reading manifest sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225 in registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-25-205123: unauthorized: authentication required
  Warning  Failed     3m (x4 over 4m)  kubelet, test-1-master-0  Error: ImagePullBackOff
  Normal   BackOff    3m (x3 over 4m)  kubelet, test-1-master-0  Back-off pulling image "registry.svc.ci.openshift.org/ocp/4.0-art-latest-2019-01-25-205123@sha256:8b848ebe6ba72a678300a0fa9b7749bcef3b4230e355e1c789527e6d1c615225"

steven-ellis commented 5 years ago

I've been having similar issues which I put down to memory sizing #1041 I've increased the memory for master-0 but I'm still not getting any workers starting.

ghost commented 5 years ago

Hi guys,

The minimum required is :

1 x master 2x workers

*router - need two worker

Best, Fábio Sbano

wking commented 5 years ago

*router - need two worker

Is there a router pull or docs I can link for that? I guess we need to bump our libvirt default to catch up.

ghost commented 5 years ago

wking,

I'll create a howto..

see running at https://youtu.be/ZOZPmwUwWj8

Best Regards, Fábio Sbano

ghost commented 5 years ago

Wking,

Are you using the latest version of the installer?

Regards, Fábio Sbano

wking commented 5 years ago

Are you using the latest version of the installer?

I haven't run it on libvirt in a while, but if the router for some reason needs 2+ compute nodes now, we'd want to update the default and some validation. Or is the issue total compute memory constraints or similar, and not actually compute replica count?

ghost commented 5 years ago

WKing,

replica count=2 then you can not hear two 443, 80 on the same physical or virtual host.

Best Regards, Fábio Sbano

ghost commented 5 years ago

I needed to change some things in my setup configuration file and adjust the memory and solve the problem with dnsmasq.

Regards, Fábio Sbano

ghost commented 5 years ago

.tf bootstrap - 32gb master - 32gb 2x worker - 4gb

and wildcard "*.apps.test1.tt.testing" on bind listen ip 192.168.126.1

Regards, Fábio Sbano

ghost commented 5 years ago

Hi,

[fsbano@voyager-1 ~]$ oc get deployment NAME READY UP-TO-DATE AVAILABLE AGE router-default 0/2 2 0 8m23s [fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc describe deployment/router-default Name: router-default Namespace: openshift-ingress CreationTimestamp: Fri, 08 Mar 2019 21:16:46 -0300 Labels: app=router ingress.openshift.io/clusteringress=default Annotations: deployment.kubernetes.io/revision=1 Selector: app=router,router=router-default Replicas: 2 desired | 2 updated | 2 total | 0 available | 2 unavailable

Type Reason Age From Message

Normal ScalingReplicaSet 8m deployment-controller Scaled up replica set router-default-779745f684 to 2 [fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc edit deployments/router-default

 nodeSelector:
    node-role.kubernetes.io/worker: ""

Best Regards, Fábio Sbano

ghost commented 5 years ago

Hi,

My install-config.yaml file

[root@voyager-1 ~]# more install-config.yaml apiVersion: v1beta3 baseDomain: tt.testing compute:

name: worker platform: {} replicas: 2 controlPlane: name: master platform: {} replicas: 1 metadata: creationTimestamp: null name: test1 networking: clusterNetworks:
- cidr: 10.128.0.0/14 hostSubnetLength: 9 machineCIDR: 192.168.126.0/24 serviceCIDR: 172.30.0.0/16 type: OpenShiftSDN platform: libvirt: URI: qemu+tcp://192.168.122.1/system network: if: tt0

$ ./openshift-install create cluster --dir . --log-level debug

Best Regards, Fábio Sbano

ghost commented 5 years ago

Hey,

Step-by-Step

[fsbano@voyager-1 ~]$ oc get pod --all-namespaces | egrep -v '(Running|Completed)' NAMESPACE NAME READY STATUS RESTARTS AGE openshift-console console-6d6ffd4444-9299h 0/1 CrashLoopBackOff 13 59m openshift-console console-6d6ffd4444-p2hpf 0/1 CrashLoopBackOff 13 59m openshift-kube-controller-manager revision-pruner-3-jaguar-kt74v-master-0 0/1 OOMKilled 0 75m [fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ sudo cat /var/named/named.apps.jaguar.fsbano.io
$TTL 1D
@   IN SOA  @ rname.invalid. (
                    0   ; serial
                    1D  ; refresh
                    1H  ; retry
                    1W  ; expire
                    3H )    ; minimum
      NS    @
    A     192.168.126.1
*       A     192.168.126.51
*       A     192.168.126.52
[fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ host prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io Host prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io not found: 3(NXDOMAIN) [fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ sudo service named restart Redirecting to /bin/systemctl restart named.service [fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ host prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io has address 192.168.126.51 prometheus-k8s-openshift-monitoring.apps.jaguar.fsbano.io has address 192.168.126.52 [fsbano@voyager-1 ~]$ 👍

[fsbano@voyager-1 ~]$ oc scale deployment.apps/console --replicas=0 deployment.apps/console scaled [fsbano@voyager-1 ~]$ oc get pod NAME READY STATUS RESTARTS AGE console-6d6ffd4444-9299h 0/1 Terminating 14 64m [fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc get pod No resources found. [fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc get pod NAME READY STATUS RESTARTS AGE console-6d6ffd4444-6c4p2 0/1 ContainerCreating 0 3s console-6d6ffd4444-9tfhv 0/1 ContainerCreating 0 3s [fsbano@voyager-1 ~]$

[fsbano@voyager-1 ~]$ oc get pod NAME READY STATUS RESTARTS AGE console-6d6ffd4444-skhg6 1/1 Running 0 2m52s console-6d6ffd4444-wcd6g 1/1 Running 0 2m52s [fsbano@voyager-1 ~]$ 💯

Best Regards, Good Night Everybody Fábio Sbano

ghost commented 5 years ago

Imagens!

okd-preview-4-libvirt okd-preview-4-cluster-status-libvirt okd-preview-4-machines-libvirt

Best Regards, Fábio Sbano

ghost commented 5 years ago

*router - need two worker

Is there a router pull or docs I can link for that? I guess we need to bump our libvirt default to catch up.

Can I send a pull request?

Best Regards, Fabio Sbano

praveenkumar commented 5 years ago

@ssbano From which commit (component) it is required to have 2 worker to make it work in case of libvirt platform? If this is hard requirement then it would be problematic for us (Code Ready Container team), we are trying only single node cluster (with no worker).

I tested 0.14.0 tag with single worker and everything worked as expected. but today when I am trying out the master then getting following error (is this because of that limitation?)

$ oc get events -n openshift-ingress
LAST SEEN   TYPE      REASON              OBJECT                                 MESSAGE
8m23s       Warning   FailedCreate        replicaset/router-default-76d66d6844   Error creating: pods "router-default-76d66d6844-" is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 80: Host ports are not allowed to be used spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 443: Host ports are not allowed to be used spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 1936: Host ports are not allowed to be used]

ghost commented 5 years ago

@praveenkumar,

I was using the master until yesterday. this morning I did downgrade to 0.14 and it is also running perfectly

Could you describe your setup?

PS: I saw that they updated the image to 20190310

Regards, Fábio Sbano

ghost commented 5 years ago

@praveenkumar,

on the two worker it works only with a work but will always be "pending".

Please,

oc project openshift-ingress oc describe router-default

see #1395

Regards, Fábio Sbano

zeenix commented 5 years ago

I think this is likely a duplicate of #1007

praveenkumar commented 5 years ago

Already fixed.

openshift / installer

[libvirt] Failed to get console route for v0.10.0 tag and ImagePullBackoff for clusterapi-manager-controllers #1078

Version

Platform (aws|libvirt|openstack):

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

References