Open marcusschiesser opened 1 year ago
I tried a couple of workarounds:
Deleting the Statefulset
- doesn't work and brings the Standalone
resource into an Error state
Adding the version as a suffix to the name, e.g.
appSources:
- name: app-1-1-1
location: myapp/v1.1.1/
This crashes the operator with:
2022-12-02T04:48:58.098Z DPANIC controller.standalone.handleAppRepoChanges odd number of arguments passed as key-value pairs for logging {"reconciler group": "enterprise.splunk.com", "reconciler kind": "Standalone", "name": "staging-playground", "namespace": "playground", "kind": "Standalone", "name": "staging-playground", "namespace": "playground", "ignored key": "Reason: App source is mising in config or remote listing"}
github.com/splunk/splunk-operator/pkg/splunk/enterprise.initAndCheckAppInfoStatus
/workspace/pkg/splunk/enterprise/util.go:1081
github.com/splunk/splunk-operator/pkg/splunk/enterprise.ApplyStandalone
/workspace/pkg/splunk/enterprise/standalone.go:82
github.com/splunk/splunk-operator/controllers.(*StandaloneReconciler).Reconcile
/workspace/controllers/standalone_controller.go:108
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:227
panic: odd number of arguments passed as key-value pairs for logging
goroutine 622 [running]: go.uber.org/zap/zapcore.(CheckedEntry).Write(0xc000262cc0, {0xc00076b440, 0x1, 0x1}) /go/pkg/mod/go.uber.org/zap@v1.19.0/zapcore/entry.go:232 +0x446 go.uber.org/zap.(Logger).DPanic(0x19ef7e7, {0x1a53aef, 0x174fdc0}, {0xc00076b440, 0x1, 0x1}) /go/pkg/mod/go.uber.org/zap@v1.19.0/logger.go:220 +0x59 github.com/go-logr/zapr.handleFields(0xc00113f980, {0xc0012c98c0, 0x3, 0xa}, {0x0, 0x0, 0x174fb40}) /go/pkg/mod/github.com/go-logr/zapr@v0.4.0/zapr.go:100 +0x535 github.com/go-logr/zapr.(zapLogger).Info(0xc0012ce520, {0x19ed9b6, 0x3}, {0xc0012c98c0, 0x3, 0x3}) /go/pkg/mod/github.com/go-logr/zapr@v0.4.0/zapr.go:127 +0x7e github.com/splunk/splunk-operator/pkg/splunk/enterprise.handleAppRepoChanges({0x1d8e310, 0xc000cdb830}, {0x4, 0xc00076b340}, {0x1ddf960, 0xc001018900}, 0xc0010190c0, 0xc0008aad20, 0xc001008a00) /workspace/pkg/splunk/enterprise/util.go:710 +0x6ba github.com/splunk/splunk-operator/pkg/splunk/enterprise.initAndCheckAppInfoStatus({0x1d8e310, 0xc000cdb830}, {0x7f1df08bbc18, 0xc000676eb0}, {0x1ddf960, 0xc001018900}, 0xc001018fa8, 0xc0010190c0) /workspace/pkg/splunk/enterprise/util.go:1081 +0x5c7 github.com/splunk/splunk-operator/pkg/splunk/enterprise.ApplyStandalone({0x1d8e310, 0xc000cdb830}, {0x7f1df08bbc18, 0xc000676eb0}, 0xc001018900) /workspace/pkg/splunk/enterprise/standalone.go:82 +0x4ee github.com/splunk/splunk-operator/controllers.(StandaloneReconciler).Reconcile(0xc000686e88, {0x1d8e310, 0xc000cdb830}, {{{0xc00034f590, 0xa}, {0xc0007cd4e8, 0x12}}}) /workspace/controllers/standalone_controller.go:108 +0x46b sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Reconcile(0xc0000a3220, {0x1d8e310, 0xc000cdb740}, {{{0xc00034f590, 0x18ca140}, {0xc0007cd4e8, 0xc00076ac80}}}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:114 +0x222 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).reconcileHandler(0xc0000a3220, {0x1d8e268, 0xc0007f5280}, {0x1849bc0, 0xc0003961e0}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:311 +0x2f2 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).processNextWorkItem(0xc0000a3220, {0x1d8e268, 0xc0007f5280}) /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:266 +0x205 sigs.k8s.io/controller-runtime/pkg/internal/controller.(Controller).Start.func2.2() /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.10.0/pkg/internal/controller/controller.go:223 +0x354
The only workaround that is working for me right now is to delete the Standalone
resource and then recreate it with the new app version, which is of course quite invasive
Hi @marcusschiesser , from your very first email, you mentioned that:
When I change the location in the Standalone resource from myapp/v1.1.0/ to myapp/v1.1.1/ (this S3 folder contains a new version of the app myapp.tgz), the new version of the app is not installed.
We recommend using the same app source location, and same app package name across the app version changes. Typically, an app source location can host multiple app packages, and when there is a change in the app package, the app framework expects still the same app package name. However, the App framework detects the change in the app package by periodically probing(as specified through appsRepoPollIntervalSeconds
) the app source location, and then upgrades that app package.
Regarding the crash you mentioned, it was fixed in the release 2.1.0.
@sgontla I know about the feature that the App framework is polling the app source location. That's useful if my build process is uploading the build of a specific branch of an app (e.g. staging
) to S3 and I automatically want that build to be picked up by the App framework.
In production environments, however, I don't want my build process to automatically deploy the latest build version. Ideally, I want to select the used version in the CR as I do in the example above - similar to that I can change the used Splunk version by changing the image
attribute in the CR.
What do you suggest to do in this case?
@marcusschiesser Ideally, this problem would be solved if you could move to the latest Splunk operator version(as suggested by @sgontla ) but since you are not able to upgrade, for now, you can try the following workaround. When you are updating the app source location, add a dummy app source(pointing to a valid s3 location with probably no apps) in the yaml file. Something like this -
appSources:
- name: app-1-1-1
location: myapp/v1.1.1/
- name: dummyAppSrc
location: myapp/dummyApp/
This should update the init-container path. I have tested this locally and it works fine. Let me know how this goes for you.
@gaurav-splunk thanks, this workaround is working with operator 1.1.0
@marcusschiesser can I close this issue now?
@gaurav-splunk we tried this workaround now for a couple of days. Seems like it's not working reliably. We're using therefore now the following approach: Our release process is creating a release
folder that is containing a copy (as S3 doesn't support symlinks) of the current release version (e.g. v1.1.1
in the example above). This means our YAML looks like this:
appSources:
- name: app
location: myapp/release/
If the operator 2.1 is supporting reliably to do app updates by referencing an updated release folder (e.g. updating location: myapp/v1.1.1/
to location: myapp/v1.1.2/
), then you can close this ticket - otherwise, this would be a feature request.
i'm also getting this:
2023-01-17T14:57:21.402380833Z DPANIC DownloadApp odd number of arguments passed as key-value pairs for logging {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "lm", "reconcileID": "c6034a2e-e14a-409e-a362-a172d3d852ee", "remoteFile": "splunk-apps/config-explorer/splunk-es-content-update_3560.tgz", "localFile": "/opt/splunk/appframework/downloadedApps/splunk-operator/LicenseManager/lm/local/Config Explorer/splunk-es-content-update_3560.tgz_c75bb8613a7c6cf4473996021bdbc354", "etag": "c75bb8613a7c6cf4473996021bdbc354", "ignored key": "splunk-apps/config-explorer/splunk-es-content-update_3560.tgz"}
github.com/splunk/splunk-operator/pkg/splunk/client.(*AWSS3Client).DownloadApp
/workspace/pkg/splunk/client/awss3client.go:255
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*RemoteDataClientManager).DownloadApp
/workspace/pkg/splunk/enterprise/util.go:826
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelineWorker).download
/workspace/pkg/splunk/enterprise/afwscheduler.go:470
panic: odd number of arguments passed as key-value pairs for logging
goroutine 917 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000768480, {0xc0003fdd80, 0x1, 0x1})
/go/pkg/mod/go.uber.org/zap@v1.21.0/zapcore/entry.go:232 +0x44c
go.uber.org/zap.(*Logger).DPanic(0x1b5deae?, {0x1bc8c40?, 0x18888e0?}, {0xc0003fdd80, 0x1, 0x1})
/go/pkg/mod/go.uber.org/zap@v1.21.0/logger.go:220 +0x59
github.com/go-logr/zapr.(*zapLogger).handleFields(0xc0005e8c90, 0xffffffffffffffff, {0xc0010b9e30, 0x1, 0x1642d2b?}, {0xc0003fdcc0?, 0x1, 0xc001116c80?})
/go/pkg/mod/github.com/go-logr/zapr@v1.2.3/zapr.go:147 +0xd3f
github.com/go-logr/zapr.(*zapLogger).Error(0xc0005e8c90, {0x1f37020?, 0xc000f238c0}, {0x1b7625b?, 0xc000420330?}, {0xc0010b9e30, 0x1, 0x1})
/go/pkg/mod/github.com/go-logr/zapr@v1.2.3/zapr.go:216 +0x1ac
github.com/go-logr/logr.Logger.Error({{0x1f545d0?, 0xc0005e8c90?}, 0x2?}, {0x1f37020, 0xc000f238c0}, {0x1b7625b, 0x1a}, {0xc0010b9e30, 0x1, 0x1})
/go/pkg/mod/github.com/go-logr/logr@v1.2.3/logr.go:279 +0xba
github.com/splunk/splunk-operator/pkg/splunk/client.(*AWSS3Client).DownloadApp(0xc000193900, {0x1f51998?, 0xc000afafc0?}, {{0xc00074b760, 0xa2}, {0xc000271ec0, 0x3d}, {0xc00004b480, 0x20}})
/workspace/pkg/splunk/client/awss3client.go:255 +0x585
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*RemoteDataClientManager).DownloadApp(0x1f51998?, {0x1f51998, 0xc000afafc0}, {0xc000271ec0, 0x3d}, {0xc00074b760, 0xa2}, {0xc00004b480, 0x20})
/workspace/pkg/splunk/enterprise/util.go:826 +0x1ad
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelineWorker).download(0xc0002e9030, {0x1f51998, 0xc000afafc0}, 0xc000ae5a90?, {{0x7f27c1667910, 0xc0002e2320}, {0x1f66a28, 0xc000590000}, 0xc000590700, 0xc0001e98f0, ...}, ...)
/workspace/pkg/splunk/enterprise/afwscheduler.go:470 +0x61f
created by github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelinePhase).downloadWorkerHandler
/workspace/pkg/splunk/enterprise/afwscheduler.go:556 +0x6be
and for now we don't split apps by versions, trying to install all from one place.
also, sometimes i'm catching that:
2023-01-17T15:38:38.5166246Z ERROR runPodCopyWorker app package pod copy failed {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "lm", "reconcileID": "94acf044-a16b-45a5-95d1-9aeee0af5aa0", "name": "lm", "namespace": "splunk-operator", "app name": "config-explorer_1715.tgz", "pod": "splunk-lm-license-manager-0", "stdout": "2", "stderr": "/bin/sh: line 1: test: /operator-staging/appframework/Config: binary operator expected\n", "failCount": 3,"error": "directory on Pod doesn't exist. stdout: 2, stdErr: /bin/sh: line 1: test: /operator-staging/appframework/Config: binary operator expected\n, err: %!s(<nil>)"}
github.com/splunk/splunk-operator/pkg/splunk/enterprise.runPodCopyWorker
/workspace/pkg/splunk/enterprise/afwscheduler.go:786
but on pod splunk-lm-license-manager-0
i can see directory /operator-staging/appframework/Config
why it fails?
my comments should be ignored, it was solved in another ticket.
but, i see that we are also affected of that.
@marcusschiesser @yaroslav-nakonechnikov Is this issue still persistent with the latest releases?
@akondur i can't say anything. we are not using app framework atm.
Please select the type of request
Bug
Tell us more
Describe the request
I am deploying a
Standalone
resource using the App Framework with the following configuration:The S3 folder
myapp/v1.1.0/
contains an app namedmyapp.tgz
.When I change the
location
in theStandalone
resource frommyapp/v1.1.0/
tomyapp/v1.1.1/
(this S3 folder contains a new version of the appmyapp.tgz
), the new version of the app is not installed.What happens though is that the pod is reinstalling the old version of the app again.
Seems that the problem is that the
amazon/aws-cli
container mounted to the pod is still being called with:s3 sync s3://mybucket/myapp/v1.1.0/ /init-apps/app/
instead ofs3 sync s3://mybucket/myapp/v1.1.1/ /init-apps/app/
Expected behavior
Splunk setup on K8S
Reproduction/Testing steps
K8s environment