openshift-metal3 / dev-scripts

Scripts to automate development/test setup for openshift integration with https://github.com/metal3-io/
Apache License 2.0
93 stars 185 forks source link

10_deploy_rook.sh fails with error: timed out waiting for the condition while running 'oc wait --for condition=ready pod -l app=rook-ceph-tools -n openshift-storage --timeout=1200s' #590

Closed mcornea closed 5 years ago

mcornea commented 5 years ago

Describe the bug 10_deploy_rook.sh fails with error: timed out waiting for the condition while running 'oc wait --for condition=ready pod -l app=rook-ceph-tools -n openshift-storage --timeout=1200s'

To Reproduce Run make. Wait to finish. Run 09_deploy_kubevirt.sh Run 10_deploy_rook.sh

Expected/observed behavior

Observed behavior:

[centos@provisionhost-0 dev-scripts]$ ./10_deploy_rook.sh 
+ source logging.sh
+++ dirname ./10_deploy_rook.sh
++ LOGDIR=./logs
++ '[' '!' -d ./logs ']'
+++ basename ./10_deploy_rook.sh .sh
+++ date +%F-%H%M%S
++ LOGFILE=./logs/10_deploy_rook-2019-06-03-204530.log
++ echo 'Logging to ./logs/10_deploy_rook-2019-06-03-204530.log'
Logging to ./logs/10_deploy_rook-2019-06-03-204530.log
++ exec
+++ tee ./logs/10_deploy_rook-2019-06-03-204530.log
+ source common.sh
+++ go env
++ eval 'GOARCH="amd64"
GOBIN=""
GOCACHE="/home/centos/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/centos/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/golang"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build535695688=/tmp/go-build -gno-record-gcc-switches"'
+++ GOARCH=amd64
+++ GOBIN=
+++ GOCACHE=/home/centos/.cache/go-build
+++ GOEXE=
+++ GOFLAGS=
+++ GOHOSTARCH=amd64
+++ GOHOSTOS=linux
+++ GOOS=linux
+++ GOPATH=/home/centos/go
+++ GOPROXY=
+++ GORACE=
+++ GOROOT=/usr/lib/golang
+++ GOTMPDIR=
+++ GOTOOLDIR=/usr/lib/golang/pkg/tool/linux_amd64
+++ GCCGO=gccgo
+++ CC=gcc
+++ CXX=g++
+++ CGO_ENABLED=1
+++ GOMOD=
+++ CGO_CFLAGS='-g -O2'
+++ CGO_CPPFLAGS=
+++ CGO_CXXFLAGS='-g -O2'
+++ CGO_FFLAGS='-g -O2'
+++ CGO_LDFLAGS='-g -O2'
+++ PKG_CONFIG=pkg-config
+++ GOGCCFLAGS='-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build535695688=/tmp/go-build -gno-record-gcc-switches'
++++ dirname common.sh
+++ cd .
+++ pwd
++ SCRIPTDIR=/home/centos/dev-scripts
+++ whoami
++ USER=centos
++ '[' -z '' ']'
++ '[' -f /home/centos/dev-scripts/config_centos.sh ']'
++ echo 'Using CONFIG /home/centos/dev-scripts/config_centos.sh'
Using CONFIG /home/centos/dev-scripts/config_centos.sh
++ CONFIG=/home/centos/dev-scripts/config_centos.sh
++ source /home/centos/dev-scripts/config_centos.sh
+++ set +x
+++ BOOTSTRAP_SSH_READY=2500
+++ NODES_PLATFORM=baremetal
+++ INT_IF=eth0
+++ PRO_IF=eth1
+++ EXT_IF=
+++ ROOT_DISK=/dev/sda
+++ NODES_FILE=/home/centos/instackenv.json
+++ MANAGE_BR_BRIDGE=n
+++ NUM_WORKERS=3
+++ CLUSTER_NAME=rhhi-virt-cluster
+++ BASE_DOMAIN=qe.lab.redhat.com
+++ RHCOS_IMAGE_URL=
++ ADDN_DNS=
++ EXT_IF=
++ PRO_IF=eth1
++ MANAGE_BR_BRIDGE=n
++ MANAGE_PRO_BRIDGE=y
++ MANAGE_INT_BRIDGE=y
++ INT_IF=eth0
++ ROOT_DISK_NAME=/dev/sda
++ FILESYSTEM=/
++ WORKING_DIR=/opt/dev-scripts
++ NODES_FILE=/home/centos/instackenv.json
++ NODES_PLATFORM=baremetal
++ MASTER_NODES_FILE=ocp/master_nodes.json
++ export NUM_MASTERS=3
++ NUM_MASTERS=3
++ export NUM_WORKERS=3
++ NUM_WORKERS=3
++ export VM_EXTRADISKS=false
++ VM_EXTRADISKS=false
++ export RHCOS_INSTALLER_IMAGE_URL=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/ootpa/410.8.20190508.1/
++ RHCOS_INSTALLER_IMAGE_URL=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/ootpa/410.8.20190508.1/
++ export RHCOS_IMAGE_URL=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/ootpa/410.8.20190508.1/
++ RHCOS_IMAGE_URL=https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/ootpa/410.8.20190508.1/
+++ curl https://releases-art-rhcos.svc.ci.openshift.org/art/storage/releases/ootpa/410.8.20190508.1//meta.json
+++ jq -r .images.openstack.path
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5154  100  5154    0     0  13244      0 --:--:-- --:--:-- --:--:-- 13249
++ export RHCOS_IMAGE_FILENAME_OPENSTACK_GZ=rhcos-410.8.20190508.1-openstack.qcow2
++ RHCOS_IMAGE_FILENAME_OPENSTACK_GZ=rhcos-410.8.20190508.1-openstack.qcow2
+++ echo rhcos-410.8.20190508.1-openstack.qcow2
+++ sed -e 's/-openstack.*//'
++ export RHCOS_IMAGE_NAME=rhcos-410.8.20190508.1
++ RHCOS_IMAGE_NAME=rhcos-410.8.20190508.1
++ export RHCOS_IMAGE_FILENAME_OPENSTACK=rhcos-410.8.20190508.1-openstack.qcow2
++ RHCOS_IMAGE_FILENAME_OPENSTACK=rhcos-410.8.20190508.1-openstack.qcow2
++ export RHCOS_IMAGE_FILENAME_COMPRESSED=rhcos-410.8.20190508.1-compressed.qcow2
++ RHCOS_IMAGE_FILENAME_COMPRESSED=rhcos-410.8.20190508.1-compressed.qcow2
++ export RHCOS_IMAGE_FILENAME_LATEST=rhcos-ootpa-latest.qcow2
++ RHCOS_IMAGE_FILENAME_LATEST=rhcos-ootpa-latest.qcow2
++ export IRONIC_IMAGE=quay.io/metal3-io/ironic:master
++ IRONIC_IMAGE=quay.io/metal3-io/ironic:master
++ export IRONIC_INSPECTOR_IMAGE=quay.io/metal3-io/ironic-inspector:master
++ IRONIC_INSPECTOR_IMAGE=quay.io/metal3-io/ironic-inspector:master
++ export IRONIC_DATA_DIR=/opt/dev-scripts/ironic
++ IRONIC_DATA_DIR=/opt/dev-scripts/ironic
++ export KUBECONFIG=/home/centos/dev-scripts/ocp/auth/kubeconfig
++ KUBECONFIG=/home/centos/dev-scripts/ocp/auth/kubeconfig
++ export 'SSH=ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5'
++ SSH='ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5'
++ export LIBVIRT_DEFAULT_URI=qemu:///system
++ LIBVIRT_DEFAULT_URI=qemu:///system
++ '[' centos '!=' root -a /run/user/1000 == /run/user/0 ']'
++ sudo -n uptime
+++ awk -F= '/^ID=/ { print $2 }' /etc/os-release
+++ tr -d '"'
++ [[ ! centos =~ ^(centos|rhel)$ ]]
+++ awk -F= '/^VERSION_ID=/ { print $2 }' /etc/os-release
+++ tr -d '"'
+++ cut -f1 -d.
++ [[ 7 -ne 7 ]]
+++ df / --output=fstype
+++ grep -v Type
++ FSTYPE=xfs
++ case ${FSTYPE} in
+++ xfs_info /
+++ grep -q ftype=1
++ [[ -n '' ]]
++ '[' 3955 = 0 ']'
++ '[' '!' -d /opt/dev-scripts ']'
++ sudo chown centos:centos /opt/dev-scripts
++ chmod 755 /opt/dev-scripts
+ figlet 'Deploying rook'
+ lolcat
 ____             _             _                               _
|  _ \  ___ _ __ | | ___  _   _(_)_ __   __ _   _ __ ___   ___ | | __
| | | |/ _ \ '_ \| |/ _ \| | | | | '_ \ / _` | | '__/ _ \ / _ \| |/ /
| |_| |  __/ |_) | | (_) | |_| | | | | | (_| | | | | (_) | (_) |   <
|____/ \___| .__/|_|\___/ \__, |_|_| |_|\__, | |_|  \___/ \___/|_|\_\
           |_|            |___/         |___/
++ go env
+ eval 'GOARCH="amd64"
GOBIN=""
GOCACHE="/home/centos/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/centos/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/golang"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build548331645=/tmp/go-build -gno-record-gcc-switches"'
++ GOARCH=amd64
++ GOBIN=
++ GOCACHE=/home/centos/.cache/go-build
++ GOEXE=
++ GOFLAGS=
++ GOHOSTARCH=amd64
++ GOHOSTOS=linux
++ GOOS=linux
++ GOPATH=/home/centos/go
++ GOPROXY=
++ GORACE=
++ GOROOT=/usr/lib/golang
++ GOTMPDIR=
++ GOTOOLDIR=/usr/lib/golang/pkg/tool/linux_amd64
++ GCCGO=gccgo
++ CC=gcc
++ CXX=g++
++ CGO_ENABLED=1
++ GOMOD=
++ CGO_CFLAGS='-g -O2'
++ CGO_CPPFLAGS=
++ CGO_CXXFLAGS='-g -O2'
++ CGO_FFLAGS='-g -O2'
++ CGO_LDFLAGS='-g -O2'
++ PKG_CONFIG=pkg-config
++ GOGCCFLAGS='-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build548331645=/tmp/go-build -gno-record-gcc-switches'
+ ROOK_VERSION=v0.9.0-519.g111610e
+ GIT_VERSION=111610e50f942c84ddc3523b4bf7b57858c19b19
+ export MIXINPATH=/home/centos/go/src/github.com/ceph/ceph-mixins
+ MIXINPATH=/home/centos/go/src/github.com/ceph/ceph-mixins
+ export ROOKPATH=/home/centos/go/src/github.com/rook/rook
+ ROOKPATH=/home/centos/go/src/github.com/rook/rook
+ cd /home/centos/go/src/github.com/rook/rook/cluster/examples/kubernetes/ceph
+ git checkout 111610e50f942c84ddc3523b4bf7b57858c19b19
Note: checking out '111610e50f942c84ddc3523b4bf7b57858c19b19'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at 111610e... Merge pull request #3066 from noahdesu/skip-early-orch
+ sed 's/name: rook-ceph$/name: openshift-storage/' common.yaml
+ sed -i 's/namespace: rook-ceph/namespace: openshift-storage/' common-modified.yaml
+ oc create -f common-modified.yaml
namespace/openshift-storage created
customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io created
customresourcedefinition.apiextensions.k8s.io/volumes.rook.io created
clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt-rules created
role.rbac.authorization.k8s.io/rook-ceph-system created
clusterrole.rbac.authorization.k8s.io/rook-ceph-global created
clusterrole.rbac.authorization.k8s.io/rook-ceph-global-rules created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster-rules created
serviceaccount/rook-ceph-system created
rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-global created
serviceaccount/rook-ceph-osd created
serviceaccount/rook-ceph-mgr created
role.rbac.authorization.k8s.io/rook-ceph-osd created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system-rules created
role.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
+ oc label namespace openshift-storage openshift.io/cluster-monitoring=true
namespace/openshift-storage labeled
+ oc policy add-role-to-user view system:serviceaccount:openshift-monitoring:prometheus-k8s -n openshift-storage
clusterrole.rbac.authorization.k8s.io/view added: "system:serviceaccount:openshift-monitoring:prometheus-k8s"
+ sed 's/namespace: rook-ceph/namespace: openshift-storage/' operator-openshift.yaml
+ sed -i s/:rook-ceph:/:openshift-storage:/ operator-openshift-modified.yaml
+ sed -i s@rook/ceph:master@rook/ceph:v0.9.0-519.g111610e@ operator-openshift-modified.yaml
+ sed -i '/ROOK_MON_HEALTHCHECK_INTERVAL/!b;n;c\          value: "30s"' operator-openshift-modified.yaml
+ sed -i '/ROOK_MON_OUT_TIMEOUT/!b;n;c\          value: "40s"' operator-openshift-modified.yaml
+ oc create -f operator-openshift-modified.yaml
securitycontextconstraints.security.openshift.io/rook-ceph created
deployment.apps/rook-ceph-operator created
+ sleep 10
+ oc wait --for condition=ready pod -l app=rook-ceph-operator -n openshift-storage --timeout=1200s
pod/rook-ceph-operator-ddf6764c7-twwmc condition met
+ oc wait --for condition=ready pod -l app=rook-ceph-agent -n openshift-storage --timeout=1200s
pod/rook-ceph-agent-c7gzv condition met
pod/rook-ceph-agent-fw7sn condition met
pod/rook-ceph-agent-jvhmj condition met
pod/rook-ceph-agent-qz56v condition met
pod/rook-ceph-agent-sf6nk condition met
pod/rook-ceph-agent-t8jnf condition met
+ oc wait --for condition=ready pod -l app=rook-discover -n openshift-storage --timeout=1200s
pod/rook-discover-hlnv8 condition met
pod/rook-discover-lx6xq condition met
pod/rook-discover-ng45b condition met
pod/rook-discover-q2v2v condition met
pod/rook-discover-wzbtt condition met
pod/rook-discover-xq4hn condition met
+ sed 's/# port: 8443/port: 8444/' cluster.yaml
+ sed -i 's/namespace: rook-ceph/namespace: openshift-storage/' cluster-modified.yaml
+ sed -i 's/allowUnsupported: false/allowUnsupported: true/' cluster-modified.yaml
+ oc create -f cluster-modified.yaml
cephcluster.ceph.rook.io/rook-ceph created
+ sed 's/namespace: rook-ceph/namespace: openshift-storage/' toolbox.yaml
+ sed -i s@rook/ceph:master@rook/ceph:v0.9.0-519.g111610e@ toolbox-modified.yaml
+ oc create -f toolbox-modified.yaml
deployment.apps/rook-ceph-tools created
+ sleep 10
+ oc wait --for condition=ready pod -l app=rook-ceph-tools -n openshift-storage --timeout=1200s
error: timed out waiting for the condition

Additional context

[centos@provisionhost-0 dev-scripts]$ oc get pods -l app=rook-ceph-tools -n openshift-storage 
NAME                              READY   STATUS              RESTARTS   AGE
rook-ceph-tools-9d9f547df-xb9x9   0/1     ContainerCreating   0          23m
[centos@provisionhost-0 dev-scripts]$ oc -n openshift-storage  describe pod rook-ceph-tools-9d9f547df-xb9x9
Name:               rook-ceph-tools-9d9f547df-xb9x9
Namespace:          openshift-storage
Priority:           0
PriorityClassName:  <none>
Node:               rhhi-node-5/192.168.123.119
Start Time:         Mon, 03 Jun 2019 20:46:24 +0000
Labels:             app=rook-ceph-tools
                    pod-template-hash=9d9f547df
Annotations:        openshift.io/scc: rook-ceph
Status:             Pending
IP:                 192.168.123.119
Controlled By:      ReplicaSet/rook-ceph-tools-9d9f547df
Containers:
  rook-ceph-tools:
    Container ID:  
    Image:         rook/ceph:v0.9.0-519.g111610e
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      /tini
    Args:
      -g
      --
      /usr/local/bin/toolbox.sh
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      ROOK_ADMIN_SECRET:  <set to the key 'admin-secret' in secret 'rook-ceph-mon'>  Optional: false
    Mounts:
      /dev from dev (rw)
      /etc/rook from mon-endpoint-volume (rw)
      /lib/modules from libmodules (rw)
      /sys/bus from sysbus (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-2jtzc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  dev:
    Type:          HostPath (bare host directory volume)
    Path:          /dev
    HostPathType:  
  sysbus:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/bus
    HostPathType:  
  libmodules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  mon-endpoint-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rook-ceph-mon-endpoints
    Optional:  false
  default-token-2jtzc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-2jtzc
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                 From                  Message
  ----     ------            ----                ----                  -------
  Normal   Scheduled         23m                 default-scheduler     Successfully assigned openshift-storage/rook-ceph-tools-9d9f547df-xb9x9 to rhhi-node-5
  Warning  FailedScheduling  23m                 default-scheduler     Binding rejected: Operation cannot be fulfilled on pods/binding "rook-ceph-tools-9d9f547df-xb9x9": pod rook-ceph-tools-9d9f547df-xb9x9 is already assigned to node "rhhi-node-5"
  Warning  FailedMount       74s (x10 over 21m)  kubelet, rhhi-node-5  Unable to mount volumes for pod "rook-ceph-tools-9d9f547df-xb9x9_openshift-storage(a676257c-8640-11e9-a77c-5254003d16f6)": timeout expired waiting for volumes to attach or mount for pod "openshift-storage"/"rook-ceph-tools-9d9f547df-xb9x9". list of unmounted volumes=[mon-endpoint-volume]. list of unattached volumes=[dev sysbus libmodules mon-endpoint-volume default-token-2jtzc]
  Warning  FailedMount       72s (x19 over 23m)  kubelet, rhhi-node-5  MountVolume.SetUp failed for volume "mon-endpoint-volume" : configmaps "rook-ceph-mon-endpoints" not found

kubelet.log: https://paste.fedoraproject.org/paste/GNVNJWjZBYn2M~FdlIZuYQ

dantrainor commented 5 years ago

I ran in to this earlier today. When I ran:

oc wait --for condition=ready pod -l app=rook-ceph-tools -n openshift-storage --timeout=1200s

...by hand, the command completed successfully.

I'm now attempting another redeploy.

mcornea commented 5 years ago

On a different environment where I hit the same issue with rook-ceph-tools container not getting created I could spot the following error in the rook-ceph-operator log so it may be related to the certificates issue:

[root@rhhi-node-3 core]# tail -10 /var/log/containers/rook-ceph-operator-ddf6764c7-pfn2v_openshift-storage_rook-ceph-operator-f9dbc860b7eba7a1e58eed818896298b21ac817e8e130bdae825273b018b85c4.log 
2019-06-03T23:19:10.010074349+00:00 stderr F 2019-06-03 23:19:10.010023 W | op-k8sutil: OwnerReferences will not be set on resources created by rook. failed to test that it can be set. configmaps "rook-test-ownerref" is forbidden: cannot set blockOwnerDeletion in this case because cannot find RESTMapping for APIVersion v1 Kind CephCluster: no matches for kind "CephCluster" in version "v1"
2019-06-03T23:19:10.028909507+00:00 stderr F 2019-06-03 23:19:10.028846 I | op-k8sutil: waiting for job rook-ceph-detect-version to complete...
2019-06-03T23:19:40.051012753+00:00 stderr F 2019-06-03 23:19:40.050907 E | op-cluster: unknown ceph major version. failed to get version job log to detect version. failed to read from stream. Get https://rhhi-node-5:10250/containerLogs/openshift-storage/rook-ceph-detect-version-rh9gm/version: remote error: tls: internal error
2019-06-03T23:23:52.405140401+00:00 stderr F W0603 23:23:52.405042       8 reflector.go:289] github.com/rook/rook/pkg/operator/ceph/cluster/controller.go:165: watch of *v1.ConfigMap ended with: too old resource version: 21837 (23160)
2019-06-03T23:23:53.408539057+00:00 stderr F 2019-06-03 23:23:53.408467 I | op-cluster: device lists are equal. skipping orchestration
itzikb-redhat commented 5 years ago

Also happened for me.

mcornea commented 5 years ago

FWIW I've been able to workaround this on my env by running the fix_certs script much more aggressively:

From b034967c002ace4339886f60deff4bcef186bbb7 Mon Sep 17 00:00:00 2001
From: Marius Cornea <mcornea@redhat.com>
Date: Wed, 29 May 2019 18:47:13 -0400
Subject: [PATCH] run fix_certs every minute

---
 06_create_cluster.sh | 2 +-
 10_deploy_rook.sh    | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/06_create_cluster.sh b/06_create_cluster.sh
index d91d8b78..219cafa2 100755
--- a/06_create_cluster.sh
+++ b/06_create_cluster.sh
@@ -70,7 +70,7 @@ create_cluster ocp

 # Run the fix_certs.sh script periodically as a workaround for
 # https://github.com/openshift-metalkube/dev-scripts/issues/260
-sudo systemd-run --on-active=30s --on-unit-active=30m --unit=fix_certs.service $(dirname $0)/fix_certs.sh
+sudo systemd-run --on-active=30s --on-unit-active=1m --unit=fix_certs.service $(dirname $0)/fix_certs.sh

 # Update kube-system ep/host-etcd used by cluster-kube-apiserver-operator to
 # generate storageConfig.urls
diff --git a/10_deploy_rook.sh b/10_deploy_rook.sh
index 74d710d1..4e749f7d 100755
--- a/10_deploy_rook.sh
+++ b/10_deploy_rook.sh
@@ -45,6 +45,7 @@ sleep 10

 # enable pg_autoscaler
 oc wait --for condition=ready  pod -l app=rook-ceph-tools -n openshift-storage --timeout=1200s
+sleep 10
 oc wait --for condition=ready  pod -l app=rook-ceph-mon -n openshift-storage --timeout=1200s
 oc -n openshift-storage exec $(oc -n openshift-storage get pod --show-all=false -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph mgr module enable pg_autoscaler --force
 oc -n openshift-storage exec $(oc -n openshift-storage get pod --show-all=false -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- ceph config set global osd_pool_default_pg_autoscale_mode on
russellb commented 5 years ago

I wouldn't burn any more energy on this. See #603 - this doesn't match how we expect these components to be run in the long term. There will be another operator that manages all of this. We're just going to drop the demo integration from dev-scripts for now.