Closed paheath closed 3 months ago
i guess this is related: https://github.com/splunk/splunk-operator/issues/1030#issuecomment-1429444280
Hello @yaroslav-nakonechnikov @paheath we will work on this change and get back to you
Hello @paheath , we are exploring possible solutions to the path style S3 URLs. Meanwhile, can you please provide an example of the working(with the modified Splunk operator image) appFramework configurations for the path style URLs?
Also, path style URLs will be discontinued per AWS documentation.
Currently, Amazon S3 supports both virtual-hosted–style and path-style URL access in all AWS Regions. However, path-style URLs will be discontinued in the future. For more information, see the following Important note.
This is an excerpt from my helm chart, and the underlying operator image is modified as indicated in the original bug description. I don't think any of the value substitutions necessarily impact the functionality. I've defined it in the yaml as documented here https://splunk.github.io/splunk-operator/AppFramework.html
appRepo:
appsRepoPollIntervalSeconds: {{ .Values.configPollInterval }}
defaults:
volumeName: {{ .Values.volumeName }}
appSources:
- name: node
location: node/
scope: local
volumes:
- name: {{ .Values.volumeName }}
storageType: s3
path: {{ .Values.bucketPath }}/
provider: aws
region: {{ .Values.bucketRegion }}
endpoint: {{ .Values.bucketEndpoint }}
secretRef: {{ .Values.secretRef }}
Hi @paheath , thanks for the example above. To further test our solution, are you able to let us know the storage provider being used to test path style S3 URLs
? Currently, by default AWS S3 buckets support both path style as well as virtual hosted. I was able to test path style specifically on S3 buckets.
I'm testing against an on-prem s3-compatible NAS. I think testing against any s3-compatible storage might be sufficient, as long as you can confirm the outbound request is hitting the path-style endpoint when configured to do so. Maybe even locally block outbound traffic to the virtual endpoint. Testing might be similar to how the smartstore path-style config is tested.
@paheath Are you able to test the changes in the MR to see if its working before we merge? If there is something missing, please comment on the MR or here it will be fixed.
@paheath Please let us know if this solution works so we can merge it.
Unfortunately I can't get this change to work. I'm seeing my clustermanager instance reporting Ready, but all the apps in the description report this:
appDeploymentInfo:
- appName: myapp.tgz
appPackageTopFolder: ""
deployStatus: 1
isUpdate: false
objectHash: <hash>
phaseInfo:
failCount: 3
phase: download
status: 199
repoState: 1
and the associated indexer cluster never reconciles. I don't see the apps appear in the pod under /opt/splunk/etc/apps or /opt/splunk/etc/manager-apps
Hey @paheath , can you share any Splunk Operator pod logs indicating any errors?
The CR status code 199
indicates that the app package was not downloaded properly.
Appears to be running through this periodically for the nodes using app framework:
2024-06-04T00:47:27.481032478Z INFO updatePplnWorkerPhaseInfo changing the status {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "appName": "app.tgz", "old status": "Download In Progress", "new status": "Download Pending"}
2024-06-04T00:47:27.657331829Z INFO downloadPhaseManager Download worker got a run slot {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "name": "lm", "namespace": "test", "App name": "app.tgz", "digest": "<digest>"}
2024-06-04T00:47:27.663811632Z INFO isAppAlreadyDownloaded App not present on operator pod {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "app name": "app.tgz"}
2024-06-04T00:47:27.663872366Z INFO updatePplnWorkerPhaseInfo changing the status {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "appName": "app.tgz", "old status": "Download Pending", "new status": "Download In Progress"}
2024-06-04T00:47:27.664103782Z INFO GetRemoteStorageClient Creating the client {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "name": "lm", "namespace": "test", "volume": "config-repo", "bucket": "<bucket>", "bucket path": "lic_manager/"}
2024-06-04T00:47:27.664283386Z INFO InitAWSClientSession AWS Client Session initialization successful. {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "region": "zone1", "TLS Version": "TLS 1.2"}
2024-06-04T00:47:27.820996027Z ERROR DownloadApp Unable to download item {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "remoteFile": "lic_manager/app.tgz", "localFile": "/opt/splunk/appframework/downloadedApps/test/LicenseManager/lm/local/lic_manager/app.tgz_<etag>", "etag": "<etag>", "RemoteFile": "lic_manager/app.tgz", "error": "stream error: stream ID 7; NO_ERROR; received from peer"}
github.com/splunk/splunk-operator/pkg/splunk/client.(*AWSS3Client).DownloadApp
/workspace/pkg/splunk/client/awss3client.go:277
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*RemoteDataClientManager).DownloadApp
/workspace/pkg/splunk/enterprise/util.go:842
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelineWorker).download
/workspace/pkg/splunk/enterprise/afwscheduler.go:497
2024-06-04T00:47:27.821131931Z ERROR PipelineWorker.Download() unable to download app {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "56c9a258-e763-484e-9ffe-6888469133de", "name": "lm", "namespace": "test", "App name": "app.tgz", "objectHash": "<digest>", "appName": "app.tgz", "error": "stream error: stream ID 7; NO_ERROR; received from peer"}
github.com/splunk/splunk-operator/pkg/splunk/enterprise.(*PipelineWorker).download
/workspace/pkg/splunk/enterprise/afwscheduler.go:499
This is the cluster manager app framework spec I'm using. Same as before with s3PathUrl: true
set.
appRepo:
appsRepoPollIntervalSeconds: {{ .Values.configPollInterval }}
defaults:
volumeName: {{ .Values.volumeName }}
appSources:
- name: node
location: node/
scope: local
volumes:
- name: {{ .Values.volumeName }}
storageType: s3
path: {{ .Values.bucketPath }}/
provider: aws
region: {{ .Values.bucketRegion }}
s3PathUrl: true
endpoint: {{ .Values.bucketEndpoint }}
secretRef: {{ .Values.secretRef }}
Hey @paheath , whilst we are debugging further were you able to successfully install the new CRDs on the new cluster before deploying the clusterManager CR? Please let us know.
Yes, I updated the CRDs beforehand. And the cluster manager accepted the s3PathUrl setting.
Well, maybe it did not take. In the cluster manager spec s3PathUrl is set to true. But when I describe the cluster manager, I see status.Smartstore.Volumes.s3PathUrl is false. Was s3PathUrl added for smartstore also?
Disregard, I see status.AppContext.AppRepo.AppSources.Volumes.s3PathUrl is set to true as expected. I didn't catch that the false setting was in the smartstore status section.
Thank you @paheath . I believe we are setting the pathStyleUrl in the AWS S3 client. It is an update of the S3 client(vs during creation in your successful example here) before creating the downloader. Some posts online don't recommend updating the client once created. I will try and cater the changes to update this option during creation.
@paheath Are you able to try it out with the latest changes?
@paheath Please let us know if the latest changes are working.
Forgive me, my bandwidth is limited at the moment. I will do my best to get to this today.
With the latest patch I'm seeing the same "unable to download item" error logs as before. The general behavior is also the same, blocking indexer cluster creation.
Hi @paheath , thank you for testing. Are you able to provide us Splunk operator pod logs similar to this:
2024-06-06T01:03:17.019639356Z INFO InitAWSClientSession Setting up AWS SDK client {"controller": "standalone", "controllerGroup": "enterprise.splunk.com", "controllerKind": "Standalone", "Standalone": {"name":"example","namespace":"splunk-operator"}, "namespace": "splunk-operator", "name": "ido", "reconcileID": "4c684039-fe1b-4bea-b550-ce618f2ef57e", "regionWithEndpoint": "us-west-2|https://s3-us-west-2.amazonaws.com", "pathStyleUrl": true}
The changes in the MR are made are keeping in mind this issue's description and changes were made here.
I see similar logs for all nodes using the app framework (standalone, licensemanager, clustermanager)
2024-06-14T21:32:41.612345298Z INFO InitAWSClientSession Setting up AWS SDK client {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "<id>", "regionWithEndpoint": "zone1|https://<endpoint-fqdn>", "pathStyleUrl": true}
2024-06-14T21:32:41.61252801Z INFO InitAWSClientSession AWS Client Session initialization successful. {"controller": "licensemanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "LicenseManager", "LicenseManager": {"name":"lm","namespace":"test"}, "namespace": "test", "name": "lm", "reconcileID": "<id>", "region": "<region>", "TLS Version": "TLS 1.2"}
I've been able to test this a little more thoroughly today. I only had to add that one line to make this work previously, but I was testing on top of 2.4.0. I was able to reproduce this successfully on top of 2.4.0 today, but cherrypicking the one-line change on top of 2.5.2 did not work. Can you think of anything that has changed between 2.4.0 and 2.5.2 that would affect the behavior of the aws s3 client? I compared the two releases, but I couldn't see anything obvious. I assume whatever is breaking this in 2.5.2 is also breaking your PR.
Hi @paheath , after the comparison between 2.4.0 and 2.5.2 I couldn't see any major differences that would cause the aws sdk client to behave differently.
We just released 2.6.0. The MR has been rebased. Could you please try with the new version?
Hey @paheath , did you get a chance to try with 2.6.0? If it's not working can you please open a Splunk support case with these details?
Closing the issue for now. Please re-open a Splunk support ticket if the issue persists.
Please select the type of request
Enhancement
Tell us more
Describe the request I am deploying the operator in an on-prem environment with a storage solution that only supports path style s3 URLs. As far as I can tell, the operator defaults to using virtual host style s3 URLs to download apps. I propose making the current behavior remain the default, and provide an option in the AppFramework spec to explicitly set the s3 URLs to path style. I rebuilt the operator with
S3ForcePathStyle: aws.Bool(true)
added here and the app framework worked as expected.Smartstore offers a similar option to specify the url version, and defaults to path style. See
remote.s3.url_version
here.Expected behavior Force the s3 client to use path style URLs when downloading apps, when set as such in the AppFramework spec.
Splunk setup on K8S SearchHeadCluster, IndexerCluster, ClusterManager, LicenseManager, MonitoringConsole, and Standalone heavy forwarder.
Reproduction/Testing steps Enable path style s3 URLs via the AppFramework spec. Verify that apps are correctly downloaded and installed.
K8s environment On-prem k8s cluster with on-prem s3-compatible NAS.