splunk / splunk-operator

Splunk Operator for Kubernetes
Other
209 stars 115 forks source link

App Framework: Updating the location of an app source is not triggering an update of the app #995

Open marcusschiesser opened 1 year ago

marcusschiesser commented 1 year ago

Please select the type of request

Bug

Tell us more

Describe the request

I am deploying a Standalone resource using the App Framework with the following configuration:

  appRepo:
    appsRepoPollIntervalSeconds: 600
    defaults:
      volumeName: splunk-apps
      scope: local
    appSources:
      - name: app
        location: myapp/v1.1.0/
    volumes:
      - name: splunk-apps
        storageType: s3
        provider: aws
        path: mybucket/
        endpoint: https://s3-eu-central-1.amazonaws.com
        secretRef: s3-splunk-apps-secret

The S3 folder myapp/v1.1.0/ contains an app named myapp.tgz.

When I change the location in the Standalone resource from myapp/v1.1.0/ to myapp/v1.1.1/ (this S3 folder contains a new version of the app myapp.tgz), the new version of the app is not installed.

What happens though is that the pod is reinstalling the old version of the app again.

Seems that the problem is that the amazon/aws-cli container mounted to the pod is still being called with: s3 sync s3://mybucket/myapp/v1.1.0/ /init-apps/app/ instead of s3 sync s3://mybucket/myapp/v1.1.1/ /init-apps/app/

Expected behavior

Splunk setup on K8S

Reproduction/Testing steps

K8s environment

marcusschiesser commented 1 year ago

I tried a couple of workarounds:

  1. Deleting the Statefulset - doesn't work and brings the Standalone resource into an Error state

  2. Adding the version as a suffix to the name, e.g.

    appSources:
      - name: app-1-1-1
        location: myapp/v1.1.1/

    This crashes the operator with:

    
    2022-12-02T04:48:58.098Z        DPANIC  controller.standalone.handleAppRepoChanges      odd number of arguments passed as key-value pairs for logging     {"reconciler group": "enterprise.splunk.com", "reconciler kind": "Standalone", "name": "staging-playground", "namespace": "playground", "kind": "Standalone", "name": "staging-playground", "namespace": "playground", "ignored key": "Reason: App source is mising in config or remote listing"}
    github.com/splunk/splunk-operator/pkg/splunk/enterprise.initAndCheckAppInfoStatus
        /workspace/pkg/splunk/enterprise/util.go:1081
    github.com/splunk/splunk-operator/pkg/splunk/enterprise.ApplyStandalone
        /workspace/pkg/splunk/enterprise/standalone.go:82
    github.com/splunk/splunk-operator/controllers.(*StandaloneReconciler).Reconcile
        /workspace/controllers/standalone_controller.go:108
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:114
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:311
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:266
    sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:227
    panic: odd number of arguments passed as key-value pairs for logging

goroutine 622 [running]: go.uber.org/zap/zapcore.(CheckedEntry).Write(0xc000262cc0, {0xc00076b440, 0x1, 0x1}) /go/pkg/mod/go.uber.org/zap@v1.19.0/zapcore/entry.go:232 +0x446 go.uber.org/zap.(Logger).DPanic(0x19ef7e7, {0x1a53aef, 0x174fdc0}, {0xc00076b440, 0x1, 0x1}) /go/pkg/mod/go.uber.org/zap@v1.19.0/logger.go:220 +0x59 github.com/go-logr/zapr.handleFields(0xc00113f980, {0xc0012c98c0, 0x3, 0xa}, {0x0, 0x0, 0x174fb40}) /go/pkg/mod/github.com/go-logr/zapr@v0.4.0/zapr.go:100 +0x535 github.com/go-logr/zapr.(zapLogger).Info(0xc0012ce520, {0x19ed9b6, 0x3}, {0xc0012c98c0, 0x3, 0x3}) /go/pkg/mod/github.com/go-logr/zapr@v0.4.0/zapr.go:127 +0x7e github.com/splunk/splunk-operator/pkg/splunk/enterprise.handleAppRepoChanges({0x1d8e310, 0xc000cdb830}, {0x4, 0xc00076b340}, {0x1ddf960, 0xc001018900}, 0xc0010190c0, 0xc0008aad20, 0xc001008a00) /workspace/pkg/splunk/enterprise/util.go:710 +0x6ba github.com/splunk/splunk-operator/pkg/splunk/enterprise.initAndCheckAppInfoStatus({0x1d8e310, 0xc000cdb830}, {0x7f1df08bbc18, 0xc000676eb0}, {0x1ddf960, 0xc001018900}, 0xc001018fa8, 0xc0010190c0) /workspace/pkg/splunk/enterprise/util.go:1081 +0x5c7 github.com/splunk/splunk-operator/pkg/splunk/enterprise.ApplyStandalone({0x1d8e310, 0xc000cdb830}, {0x7f1df08bbc18, 0xc000676eb0}, 0xc001018900) /workspace/pkg/splunk/enterprise/standalone.go:82 +0x4ee github.com/splunk/splunk-operator/controllers.(StandaloneReconciler).Reconcile(0xc000686e88, {0x1d8e310, 0xc000cdb830}, {{{0xc00034f590, 0xa}, {0xc0007cd4e8, 0x12}}}) /workspace/controllers/standalone_controller.go:108 +0x46b sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile(0xc0000a3220, {0x1d8e310, 0xc000cdb740}, {{{0xc00034f590, 0x18ca140}, {0xc0007cd4e8, 0xc00076ac80}}}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:114 +0x222 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler(0xc0000a3220, {0x1d8e268, 0xc0007f5280}, {0x1849bc0, 0xc0003961e0}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:311 +0x2f2 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem(0xc0000a3220, {0x1d8e268, 0xc0007f5280}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:266 +0x205 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:223 +0x354

marcusschiesser commented 1 year ago

The only workaround that is working for me right now is to delete the Standalone resource and then recreate it with the new app version, which is of course quite invasive

sgontla commented 1 year ago

Hi @marcusschiesser , from your very first email, you mentioned that: When I change the location in the Standalone resource from myapp/v1.1.0/ to myapp/v1.1.1/ (this S3 folder contains a new version of the app myapp.tgz), the new version of the app is not installed.

We recommend using the same app source location, and same app package name across the app version changes. Typically, an app source location can host multiple app packages, and when there is a change in the app package, the app framework expects still the same app package name. However, the App framework detects the change in the app package by periodically probing(as specified through appsRepoPollIntervalSeconds) the app source location, and then upgrades that app package.

Regarding the crash you mentioned, it was fixed in the release 2.1.0.

marcusschiesser commented 1 year ago

@sgontla I know about the feature that the App framework is polling the app source location. That's useful if my build process is uploading the build of a specific branch of an app (e.g. staging) to S3 and I automatically want that build to be picked up by the App framework. In production environments, however, I don't want my build process to automatically deploy the latest build version. Ideally, I want to select the used version in the CR as I do in the example above - similar to that I can change the used Splunk version by changing the image attribute in the CR. What do you suggest to do in this case?

gaurav-splunk commented 1 year ago

@marcusschiesser Ideally, this problem would be solved if you could move to the latest Splunk operator version(as suggested by @sgontla ) but since you are not able to upgrade, for now, you can try the following workaround. When you are updating the app source location, add a dummy app source(pointing to a valid s3 location with probably no apps) in the yaml file. Something like this -

 appSources:
      - name: app-1-1-1
        location: myapp/v1.1.1/
     - name: dummyAppSrc
        location: myapp/dummyApp/

This should update the init-container path. I have tested this locally and it works fine. Let me know how this goes for you.

marcusschiesser commented 1 year ago

@gaurav-splunk thanks, this workaround is working with operator 1.1.0

gaurav-splunk commented 1 year ago

@marcusschiesser can I close this issue now?

marcusschiesser commented 1 year ago

@gaurav-splunk we tried this workaround now for a couple of days. Seems like it's not working reliably. We're using therefore now the following approach: Our release process is creating a release folder that is containing a copy (as S3 doesn't support symlinks) of the current release version (e.g. v1.1.1 in the example above). This means our YAML looks like this:

  appSources:
      - name: app
        location: myapp/release/

If the operator 2.1 is supporting reliably to do app updates by referencing an updated release folder (e.g. updating location: myapp/v1.1.1/ to location: myapp/v1.1.2/), then you can close this ticket - otherwise, this would be a feature request.

yaroslav-nakonechnikov commented 1 year ago

i'm also getting this:

2023-01-17T14:57:21.402380833Z  DPANIC  DownloadApp     odd number of arguments passed as key-value pairs for logging   {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "lm", "reconcileID": "c6034a2e-e14a-409e-a362-a172d3d852ee", "remoteFile": "splunk-apps/config-explorer/splunk-es-content-update_3560.tgz", "localFile": "/opt/splunk/appframework/downloadedApps/splunk-operator/LicenseManager/lm/local/Config Explorer/splunk-es-content-update_3560.tgz_c75bb8613a7c6cf4473996021bdbc354", "etag": "c75bb8613a7c6cf4473996021bdbc354", "ignored key": "splunk-apps/config-explorer/splunk-es-content-update_3560.tgz"}
github.com/splunk/splunk-operator/pkg/splunk/client.(*AWSS3Client).DownloadApp
        /workspace/pkg/splunk/client/awss3client.go:255
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*RemoteDataClientManager).DownloadApp
        /workspace/pkg/splunk/enterprise/util.go:826
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelineWorker).download
        /workspace/pkg/splunk/enterprise/afwscheduler.go:470
panic: odd number of arguments passed as key-value pairs for logging

goroutine 917 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000768480, {0xc0003fdd80, 0x1, 0x1})
        /go/pkg/mod/go.uber.org/zap@v1.21.0/zapcore/entry.go:232 +0x44c
go.uber.org/zap.(*Logger).DPanic(0x1b5deae?, {0x1bc8c40?, 0x18888e0?}, {0xc0003fdd80, 0x1, 0x1})
        /go/pkg/mod/go.uber.org/zap@v1.21.0/logger.go:220 +0x59
github.com/go-logr/zapr.(*zapLogger).handleFields(0xc0005e8c90, 0xffffffffffffffff, {0xc0010b9e30, 0x1, 0x1642d2b?}, {0xc0003fdcc0?, 0x1, 0xc001116c80?})
        /go/pkg/mod/github.com/go-logr/zapr@v1.2.3/zapr.go:147 +0xd3f
github.com/go-logr/zapr.(*zapLogger).Error(0xc0005e8c90, {0x1f37020?, 0xc000f238c0}, {0x1b7625b?, 0xc000420330?}, {0xc0010b9e30, 0x1, 0x1})
        /go/pkg/mod/github.com/go-logr/zapr@v1.2.3/zapr.go:216 +0x1ac
github.com/go-logr/logr.Logger.Error({{0x1f545d0?, 0xc0005e8c90?}, 0x2?}, {0x1f37020, 0xc000f238c0}, {0x1b7625b, 0x1a}, {0xc0010b9e30, 0x1, 0x1})
        /go/pkg/mod/github.com/go-logr/logr@v1.2.3/logr.go:279 +0xba
github.com/splunk/splunk-operator/pkg/splunk/client.(*AWSS3Client).DownloadApp(0xc000193900, {0x1f51998?, 0xc000afafc0?}, {{0xc00074b760, 0xa2}, {0xc000271ec0, 0x3d}, {0xc00004b480, 0x20}})
        /workspace/pkg/splunk/client/awss3client.go:255 +0x585
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*RemoteDataClientManager).DownloadApp(0x1f51998?, {0x1f51998, 0xc000afafc0}, {0xc000271ec0, 0x3d}, {0xc00074b760, 0xa2}, {0xc00004b480, 0x20})
        /workspace/pkg/splunk/enterprise/util.go:826 +0x1ad
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelineWorker).download(0xc0002e9030, {0x1f51998, 0xc000afafc0}, 0xc000ae5a90?, {{0x7f27c1667910, 0xc0002e2320}, {0x1f66a28, 0xc000590000}, 0xc000590700, 0xc0001e98f0, ...}, ...)
        /workspace/pkg/splunk/enterprise/afwscheduler.go:470 +0x61f
created by github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelinePhase).downloadWorkerHandler
        /workspace/pkg/splunk/enterprise/afwscheduler.go:556 +0x6be

and for now we don't split apps by versions, trying to install all from one place.

yaroslav-nakonechnikov commented 1 year ago

also, sometimes i'm catching that:

2023-01-17T15:38:38.5166246Z    ERROR   runPodCopyWorker        app package pod copy failed     {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "lm", "reconcileID": "94acf044-a16b-45a5-95d1-9aeee0af5aa0", "name": "lm", "namespace": "splunk-operator", "app name": "config-explorer_1715.tgz", "pod": "splunk-lm-license-manager-0", "stdout": "2", "stderr": "/bin/sh: line 1: test: /operator-staging/appframework/Config: binary operator expected\n", "failCount": 3,"error": "directory on Pod doesn't exist. stdout: 2, stdErr: /bin/sh: line 1: test: /operator-staging/appframework/Config: binary operator expected\n, err: %!s(<nil>)"}
github.com/splunk/splunk-operator/pkg/splunk/enterprise.runPodCopyWorker
        /workspace/pkg/splunk/enterprise/afwscheduler.go:786

but on pod splunk-lm-license-manager-0 i can see directory /operator-staging/appframework/Config

why it fails?

yaroslav-nakonechnikov commented 1 year ago

my comments should be ignored, it was solved in another ticket.

but, i see that we are also affected of that.

akondur commented 5 months ago

@marcusschiesser @yaroslav-nakonechnikov Is this issue still persistent with the latest releases?

yaroslav-nakonechnikov commented 5 months ago

@akondur i can't say anything. we are not using app framework atm.