turbonomic / t8c-install

23 stars 35 forks source link

Cannot deploy 8.3.4 on Openshift 4.7.32 #18

Closed fckbo closed 2 years ago

fckbo commented 2 years ago

Problem Description

Hello, I followed the instructions here: https://github.com/turbonomic/t8c-install/blob/master/DEPLOY.md

kubectl create ns turbonomic
kubectl create -f https://raw.githubusercontent.com/turbonomic/t8c-install/master/operator/deploy/service_account.yaml -n turbonomic
kubectl create -f https://raw.githubusercontent.com/turbonomic/t8c-install/master/operator/deploy/role.yaml -n turbonomic
kubectl create -f https://raw.githubusercontent.com/turbonomic/t8c-install/master/operator/deploy/role_binding.yaml -n turbonomic
kubectl create -f https://raw.githubusercontent.com/turbonomic/t8c-install/master/operator/deploy/crds/charts_v1alpha1_xl_crd.yaml
kubectl create -f https://raw.githubusercontent.com/turbonomic/t8c-install/master/operator/deploy/operator.yaml -n turbonomic
oc adm policy add-scc-to-group anyuid system:serviceaccounts:turbonomic
kubectl apply -f https://raw.githubusercontent.com/turbonomic/t8c-install/master/operator/deploy/crds/charts_v1alpha1_xl_cr.yaml -n turbonomic

Some of the pods start correctly but some do not:

oc get pods
NAME                                   READY   STATUS             RESTARTS   AGE
action-orchestrator-7bf9b98fdc-7v9x6   0/1     Running            1          66m
api-5cd756c595-shd2p                   0/1     Running            1          66m
auth-6cb8796fd7-r222k                  0/1     Running            1          66m
clustermgr-686c98b5c-qc4xt             0/1     CrashLoopBackOff   17         66m
consul-856f9c5bbd-qlpcs                1/1     Running            0          66m
cost-5ffc45dd88-f7dqg                  0/1     Running            1          66m
db-7c789b97f5-hfdh6                    1/1     Running            0          66m
group-74d5d57d99-g74nm                 0/1     Running            1          66m
history-85978fd497-r9h2m               0/1     Running            0          66m
kafka-7494f4ffdc-qxjxk                 1/1     Running            1          66m
market-646668447f-gv9sz                0/1     Running            1          66m
nginx-77945cb484-6qpfh                 1/1     Running            0          66m
plan-orchestrator-949477bcd-mr4ws      0/1     Running            1          66m
repository-74b4b448c4-nhpx2            0/1     Running            1          66m
rsyslog-8588f898b7-kbvlz               1/1     Running            0          66m
t8c-operator-85954f4cd6-8xg6p          1/1     Running            0          4h7m
topology-processor-f6f578bc6-7bzll     0/1     Running            1          66m
ui-5fcb6448f4-df5xq                    1/1     Running            0          66m
zookeeper-868d99f984-p66qt             1/1     Running            0          66m

Outcome expected:

All Turbonomic pods are 1/1 Running

Additional info

oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.32    True        False         27d     Cluster version is 4.7.32
oc get nodes
NAME                            STATUS   ROLES    AGE   VERSION
master1.hmcmp03.nca.ihost.com   Ready    master   28d   v1.20.0+bbbc079
master2.hmcmp03.nca.ihost.com   Ready    master   28d   v1.20.0+bbbc079
master3.hmcmp03.nca.ihost.com   Ready    master   28d   v1.20.0+bbbc079
worker1.hmcmp03.nca.ihost.com   Ready    worker   27d   v1.20.0+bbbc079
worker2.hmcmp03.nca.ihost.com   Ready    worker   27d   v1.20.0+bbbc079
worker3.hmcmp03.nca.ihost.com   Ready    worker   27d   v1.20.0+bbbc079

t8c-l1.log

I'm joining the log of the rsyslog-8588f898b7-kbvlz pod to provide more details

esara commented 2 years ago

@fckbo thanks for trying out the platform, I see that the clustermgr is failing with an initialized error, but all of the dependencies should have automatically resolve in minutes (not an hour as you have been waiting). one question, what is the underlying storage provider / default storage class? Are you using block storage?

fckbo commented 2 years ago

Hello, thx for your prompt reply, I'm using nfs-provisionner with NFS storage

fckbo commented 2 years ago

In the meantime, I actually tried another way to install it using the Openshift console rather than the command lines mentioned above & coming from your github readme. I did the following: 1) I deployed the Turbonomic Platform Operator from the Operator Hub 2) I then created an XL instance with OpenShift Ingress enabled 3) I had to do it twice as the first time it did not work & had to run the same "add-scc-to-group" command as the one mentioned above to get it to work.

oc adm policy add-scc-to-group anyuid system:serviceaccounts:<my-namespace>

And then after few minutes it worked.

oc get pod
NAME                                   READY   STATUS    RESTARTS   AGE
action-orchestrator-5fd48678f6-24rx4   1/1     Running   0          95m
api-5fdc46bc57-mddvn                   1/1     Running   0          95m
auth-5d769c9f54-q688b                  1/1     Running   0          95m
clustermgr-77c5997b6-2gxwz             1/1     Running   0          95m
consul-6d67896779-rt7jq                1/1     Running   0          95m
cost-7fcbb47584-ljlmw                  1/1     Running   0          95m
db-6dc8d5b6-c45cj                      1/1     Running   0          95m
group-7698777c94-zmjlf                 1/1     Running   0          95m
history-5799f865cb-pr9j5               1/1     Running   0          95m
kafka-5b58c46bcb-74bbl                 1/1     Running   1          95m
market-59587cf5b-2qf7g                 1/1     Running   0          95m
nginx-77c76c645f-rbnmg                 1/1     Running   0          95m
plan-orchestrator-674d9cc776-rmz84     1/1     Running   0          95m
repository-5466f87569-mvlfv            1/1     Running   0          95m
rsyslog-76f445d5f8-drvk2               1/1     Running   0          95m
t8c-operator-6db7ff7d59-pfhzm          1/1     Running   0          31m
topology-processor-7c6c5967fc-lqz6m    1/1     Running   0          95m
ui-79479bd987-25jvv                    1/1     Running   0          95m
zookeeper-6b7c4966fc-jlhg2             1/1     Running   0          95m

I noticed that in this case the 8.3.5 was installed...

I'm still interested in understanding why the CLI approach taken from your GitHub repo does not work and seems to be installing a previous version: 8.3.4. The reason being that I intend to develop some Ansible automation to perform install & uninstall and so CLI approach is what I need.

Thx

esara commented 2 years ago

I would strongly suggest to deploy the Turbonomic Platform operator from the Redhat Certified Operatorhub (even though the actual same operator binary as if you deployed the operator manually from github)

8.3.5 was released as GA today, so that is what we wish all customers to run as of today (although the same operator should have installed 8.3.4 or future versions as well as upgrades from 8.3.4/earlier to 8.3.5 and later without a problem).

If you want to automate the installation on OCP, please automate the installation using the certified operatorhub (this does not need to be done through the console). for example, we are automating the install with something like

#Install Turbonomic Platform Operator
echo "Installing Turbonomic Platform Operator"
cat << EOF | oc -n ${NS} apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  annotations:
    olm.providedAPIs: Xl.v1.charts.helm.k8s.io
  name: turbonomic-mkk5d
  namespace: ${NS}
spec:
  targetNamespaces:
  - ${NS}
EOF

cat << EOF | oc -n ${NS} apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    operators.coreos.com/t8c-certified.turbonomic: ""
  name: t8c-certified
  namespace: ${NS}
spec:
  name: t8c-certified
  source: certified-operators
  sourceNamespace: openshift-marketplace
EOF
until oc get crd xls.charts.helm.k8s.io >> /dev/null 2>&1; do sleep 5; done

#Install Turbonomic Platform 
echo "Installing Turbonomic Platform"
cat << EOF | oc -n ${NS} apply -f -
apiVersion: charts.helm.k8s.io/v1
kind: Xl
metadata:
  name: xl-release
spec:
  global:
    repository: registry.connect.redhat.com/turbonomic
    customImageNames: false
    tag: 8.3.5
  nginxingress:
    enabled: false
  openshiftingress:
    enabled: true
  ui:
    enabled: false
EOF
fckbo commented 2 years ago

@esara thx a lot