splunk / splunk-operator

Splunk Operator for Kubernetes
Other
210 stars 115 forks source link

Splunk Operator: controller doesn't detect if statefulset deleted for standalone type #1284

Closed yaroslav-nakonechnikov closed 4 months ago

yaroslav-nakonechnikov commented 9 months ago

Please select the type of request

Bug

Tell us more

Describe the request if there are some changes done in stateful set for standalone instance - controlled doesn't recreate it as it is done for other types

Expected behavior controller creates statefulset from crd

yaroslav-nakonechnikov commented 9 months ago
$ kubectl get standalone
NAME     PHASE   DESIRED   READY   AGE
c-prod                             23m

in description there was warning: Warning validateStandaloneSpec 20m splunk-standalone-controller validate standalone spec failed invalid Volume Name for App Source: custom. volume: csh, doesn't exist

but, it shouldn't be the case, as before deleting statefulset it was working

vivekr-splunk commented 9 months ago

can you add CR spec here, we will retry on our end and let you know

yaroslav-nakonechnikov commented 9 months ago

broken example, which creates warning and blocking creating resources:

 appRepo:
    appInstallPeriodSeconds: 90
    appSources:
    - location: custom/
      name: custom
      scope: local
      volumeName: csh
    - location: base/
      name: base
      scope: local
      volumeName: csh
    appsRepoPollIntervalSeconds: 120
    installMaxRetries: 3
    volumes:
    - endpoint: https://s3-eu-central-1.amazonaws.com
      name: splunkapps
      path: bucketname/splunk-apps/csh
      provider: aws
      region: eu-central-1
      storageType: s3

correct example:

 appRepo:
    appInstallPeriodSeconds: 90
    appSources:
    - location: custom/
      name: custom
      scope: local
      volumeName: splunkapps
    - location: base/
      name: base
      scope: local
      volumeName: splunkapps
    appsRepoPollIntervalSeconds: 120
    installMaxRetries: 3
    volumes:
    - endpoint: https://s3-eu-central-1.amazonaws.com
      name: splunkapps
      path: bucketname/splunk-apps/csh
      provider: aws
      region: eu-central-1
      storageType: s3

so, all in all, expectation that warning doesn't block anything, but feature just not working. or, it should be error - and in that case it will be expected.

akondur commented 6 months ago

Hi @yaroslav-nakonechnikov , in every custom resource's reconcile logic we do a validation of its spec and error out if any CR spec is erroneous(same for all CRs).

For standalone we validate its spec here. Another example of an indexer cluster here.

As mentioned here already, Splunk operator records a Warning event in the standalone CR indicating incorrect volume for the app source custom. Currently, only the following event types: {Normal, Warning} are supported as mentioned in the golang documentation here. In the future, if they introduce an {Error} type we will make sure to mark the event as an error.

Additionally we log an error here to indicate an erroneous config.

Please let us know if the explanation above is acceptable.

akondur commented 6 months ago

Hi @yaroslav-nakonechnikov , please let us know if the explanation above is acceptable.

yaroslav-nakonechnikov commented 6 months ago

explanation doesn't fix problem.

i expect that controller overrides stateful set (and or recreate it) in case of any changes in statefulset.

otherwise, behavior is not consistent and leads to additional issues, which are not expected.

akondur commented 6 months ago

Hey @yaroslav-nakonechnikov , currently we are looking to set the Phase as Error in the CR status in such scenarios to indicate that the CR spec is invalid i.e

NAME     PHASE   DESIRED   READY   AGE
c-prod   Error                     23m

In the future we are looking to add a Message field in the CR status to indicate such errors i.e

NAME     PHASE   DESIRED   READY   AGE   MESSAGE
c-prod   Error                     23m   validateStandaloneSpec: validate standalone spec failed invalid Volume Name for App Source: custom. volume: csh, doesn't exist

Please let us know if that solution works.

yaroslav-nakonechnikov commented 6 months ago

yes, adding nice log messages will help a lot!

akondur commented 4 months ago

Hey @yaroslav-nakonechnikov , can we close this issue as we have added a message in the CR status?

yaroslav-nakonechnikov commented 4 months ago

yes, for now good to be closed