Closed yaroslav-nakonechnikov closed 4 months ago
so, i tried to downgrade 2.5.1 (and 2.5.0) to 2.4.0 - and it worked. What, actually, not expected at all.
Hi @yaroslav-nakonechnikov , greetings, when you say it worked after downgrading to 2.4.0 can you elaborate what worked. Are you saying phase changed to "Ready"?
@vivekr-splunk yes, phase status became ready, and site4 was created back, as expected.
it is starting to be a critical.
as crd created:
[yn@ip-10-216-35-48 /]$ kubectl get indexerclusters -n splunk-operator
NAME PHASE MASTER MANAGER DESIRED READY AGE
site1-32002 Error Ready 3 0 77m
site2-32002 Error Ready 3 0 77m
site3-32002 Error Ready 3 0 77m
site4-32002 Error Ready 3 0 77m
site5-32002 Error Ready 3 0 77m
site6-32002 Error Ready 3 5 5d4h
in manager i see next logs:
2024-03-19T13:49:32.244551457Z INFO start {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "indexercluster": "splunk-operator/site5-32002", "CR version": "4044269"}
2024-03-19T13:49:32.244647588Z INFO ApplyConfigMap No changes for ConfigMap {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "name": "splunk-site5-32002-indexer-defaults", "namespace": "splunk-operator"}
2024-03-19T13:49:32.413422389Z INFO ApplyService No update to existing Service {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "name": "splunk-site5-32002-indexer-headless", "namespace": "splunk-operator"}
2024-03-19T13:49:32.413454413Z INFO ApplyService No update to existing Service {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "name": "splunk-site5-32002-indexer-service", "namespace": "splunk-operator"}
2024-03-19T13:49:32.4136215Z INFO ApplyConfigMap No changes for ConfigMap {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "name": "splunk-splunk-operator-probe-configmap", "namespace": "splunk-operator"}
2024-03-19T13:49:32.413793087Z INFO getLivenessProbe LivenessProbe {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "name": "site5-32002", "namespace": "splunk-operator", "Configured": "&Probe{ProbeHandler:ProbeHandler{Exec:&ExecAction{Command:[/mnt/probes/livenessProbe.sh],},HTTPGet:nil,TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:240,TimeoutSeconds:30,PeriodSeconds:30,SuccessThreshold:0,FailureThreshold:60,TerminationGracePeriodSeconds:nil,}"}
2024-03-19T13:49:32.413815103Z INFO getReadinessProbe ReadinessProbe {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "name": "site5-32002", "namespace": "splunk-operator", "Configured": "&Probe{ProbeHandler:ProbeHandler{Exec:&ExecAction{Command:[/mnt/probes/readinessProbe.sh],},HTTPGet:nil,TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:240,TimeoutSeconds:30,PeriodSeconds:30,SuccessThreshold:0,FailureThreshold:60,TerminationGracePeriodSeconds:nil,}"}
2024-03-19T13:49:32.41382608Z INFO getStartupProbe StartupProbe {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "name": "site5-32002", "namespace": "splunk-operator", "Configured": "&Probe{ProbeHandler:ProbeHandler{Exec:&ExecAction{Command:[/mnt/probes/startupProbe.sh],},HTTPGet:nil,TCPSocket:nil,GRPC:nil,},InitialDelaySeconds:40,TimeoutSeconds:30,PeriodSeconds:60,SuccessThreshold:0,FailureThreshold:60,TerminationGracePeriodSeconds:nil,}"}
2024-03-19T13:49:32.41399514Z INFO isClusterManagerReadyForUpgrade kind is set to {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "name": "site5-32002", "namespace": "splunk-operator", "kind": "IndexerCluster"}
2024-03-19T13:49:32.414144671Z INFO updateCRStatus Trying to update {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "original cr version": "4044269", "count": 0}
2024-03-19T13:49:32.428603698Z INFO updateCRStatus Status update successful {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "original cr version": "4044269", "current CR version": "4044269", "updated CR version":"4044269"}
2024-03-19T13:49:32.428634239Z INFO updateCRStatus Cache is reflecting the latest CR {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "original cr version": "4044269", "updated CR version": "4044269"}
2024-03-19T13:49:32.428639088Z INFO Requeued {"controller": "indexercluster", "controllerGroup": "enterprise.splunk.com", "controllerKind": "IndexerCluster", "IndexerCluster": {"name":"site5-32002","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "site5-32002", "reconcileID": "34cb7d35-7ea9-4608-9181-0380bf16d8aa", "indexercluster": "splunk-operator/site5-32002", "period(seconds)": 5}
but statefulset is not created:
[yn@ip-10-216-35-48 /]$ kubectl get statefulset -n splunk-operator
NAME READY AGE
splunk-32002-cluster-manager 1/1 41m
splunk-32002-license-manager 1/1 62m
splunk-32002-monitoring-console 0/1 24m
[yn@ip-10-216-35-48 /]$
why? what is wrong?
@yaroslav-nakonechnikov, we acknowledge that error codes and messages are not clearly documented. We're currently planning to revamp the error handling process and include error messages in the status to provide a clearer explanation of why the reconciliation fails. We'll keep you informed once this is completed.
Hey @yaroslav-nakonechnikov , this MR adds a message field to the CR status section indicating details of the error message. Can you please try it and let us know if this solution works?
i believe it can be closed for now, as documentation updated. If new questions will be - new ticket will be raised.
yes, not it is much better!
$ kubectl get indexerclusters.enterprise.splunk.com -n splunk-operator
NAME PHASE MASTER MANAGER DESIRED READY AGE MESSAGE
site1-43345 Error Ready 3 0 16m StatefulSet.apps "splunk-e-43345-search-head" not found
site2-43345 Error Ready 3 0 16m StatefulSet.apps "splunk-e-43345-search-head" not found
site3-43345 Error Error 3 0 16m StatefulSet.apps "splunk-43345-cluster-manager" not found
site4-43345 Error Ready 3 0 16m StatefulSet.apps "splunk-e-43345-search-head" not found
site5-43345 Error Error 3 0 16m StatefulSet.apps "splunk-43345-cluster-manager" not found
site6-43345 Error Ready 3 0 16m StatefulSet.apps "splunk-e-43345-search-head" not found
$ kubectl get indexerclusters.enterprise.splunk.com -n splunk-operator
NAME PHASE MASTER MANAGER DESIRED READY AGE MESSAGE
site1-43345 Error Ready 3 0 16m
site2-43345 Error Ready 3 0 16m
site3-43345 Error Error 3 0 16m StatefulSet.apps "splunk-43345-cluster-manager" not found
site4-43345 Error Ready 3 0 16m
site5-43345 Error Error 3 0 16m StatefulSet.apps "splunk-43345-cluster-manager" not found
site6-43345 Error Ready 3 0 16m
$ kubectl get indexerclusters.enterprise.splunk.com -n splunk-operator
NAME PHASE MASTER MANAGER DESIRED READY AGE MESSAGE
site1-43345 Error Ready 3 0 16m could not get cluster info from cluster manager
site2-43345 Error Ready 3 0 16m could not get cluster info from cluster manager
site3-43345 Error Error 3 0 17m StatefulSet.apps "splunk-43345-cluster-manager" not found
site4-43345 Error Ready 3 0 16m could not get cluster info from cluster manager
site5-43345 Error Error 3 0 17m StatefulSet.apps "splunk-43345-cluster-manager" not found
site6-43345 Error Ready 3 0 16m could not get cluster info from cluster manager
Please select the type of request
Enhancement
Tell us more
Describe the request Need a bit better understanding status of phases for crd's:
Ready
- is clear. All works as expected.Pending
- is understandable, as pod currently not running.Error
- not clear at all. for example:so, i'd expect that it should be
Ready
for 5 records of 6, but it saysError
for each indexer resource.doesn't give any error code. just
Phase: error
.Expected behavior There is an article with troubleshooting section, where are some advices given to check statuses.