splunk / splunk-operator

Splunk Operator for Kubernetes
Other
205 stars 113 forks source link

splunk smartstore not using pod role #627

Open dpachiappan opened 2 years ago

dpachiappan commented 2 years ago

Hi,

Is there a way to configure splunk smartstore to authenticate via AWS_WEB_IDENTITY_TOKEN_FILE that is injected into the pod via service account mapping with IamRole?

Currently, the indexer is failing to upload the index data to S3 bucket as it is trying to use the instance role that doesn't have permissions.

akondur commented 2 years ago

Hi @dpachiappan , we are currently reviewing SmartStore connectivity(via splunkd) to S3 via AWS_WEB_IDENTITY_TOKEN_FILE(serviceaccount mapping to IAM role) token mounted on pod. Contrary to the suggestion on a different github issue, the Splunk Container is able to connect to S3(via aws-cli) commands when its mapped to a serviceaccount without running as root.

Example standalone yaml running with default securityContext 41812(runAsUser & fsGroup)

apiVersion: enterprise.splunk.com/v3
kind: Standalone
metadata:
  name: test3
  finalizers:
  - enterprise.splunk.com/delete-pvc
spec:
  serviceAccount: splunk-operator-conrtoller-manager

The required env variables are added to the pod:

[splunk@splunk-test3-standalone-0 splunk]$ env | grep -i aws
AWS_DEFAULT_REGION=us-west-2
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_REGION=us-west-2
AWS_ROLE_ARN=arn:aws:iam::667741767953:role/akondur-s3-role

Note: serviceAccount splunk-operator-controller-manager is mapped to the IAM role akondur-s3-role which has permissions to S3 Read/Write/List operations

Verifying S3 connectivity on container:

[splunk@splunk-test3-standalone-0 tmp]$ /tmp/bin/aws --version
aws-cli/2.6.4 Python/3.9.11 Linux/5.4.172-90.336.amzn2.x86_64 exe/x86_64.rhel.8 prompt/off
[splunk@splunk-test3-standalone-0 tmp]$ /tmp/bin/aws s3 ls s3://arjunk-s3-bucket/smartstore/
                           PRE _audit/
                           PRE _internal/
                           PRE _introspection/
                           PRE _metrics/
2022-05-06 17:19:06          0
[splunk@splunk-test3-standalone-0 tmp]$ /tmp/bin/aws s3 ls s3://arjunk-s3-bucket/
                           PRE smartstore/
[splunk@splunk-test3-standalone-0 tmp]$

We are setting security context with fsGroup 41812 which allows access to the token. Read the issue here for reference.

Read access provided for the mounted token via EKS:

[splunk@splunk-test3-standalone-0 serviceaccount]$ ls -al /var/run/secrets/eks.amazonaws.com/serviceaccount/..data/token
-rw-r----- 1 splunk splunk 1084 May 13 01:29 /var/run/secrets/eks.amazonaws.com/serviceaccount/..data/token
satellite-no commented 1 year ago

I assume since this is through Splunkd, it would require changes to Core Splunk outside of the Splunk-operator. Is there anything we can do as a community to help speed this review along? Voting on a feature request? Talking to our partner Splunk reps?

Glad to help where I can.

marcusschiesser commented 1 year ago

Can't agree more with @satellite-no - I have the same issue and it's a big blocker for using SmartStore with EKS in a secure manner. As using the operator requires SmartStore, SmartStore requires AWS S3 and K8S clusters on AWS are usually running on EKS with IRSA - how are we supposed to set up a secure indexer cluster with the operator at all?

In addition to the stuff written above:

I can also access my S3 storage from my pod using the Python SDK from AWS (easier test case than using AWS CLI):

# pip install boto3
# python

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my-bucket')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object)

And here's the misleading error message that I got from Splunk accessing the storage:

│ indexer-0 Problem parsing indexes.conf: Cannot load IndexConfig: Unable to load remote volume "s3_volume" of scheme "s3" referenced by index "_audit": Could not find access_key and/or secret_key in a configuration file, in environment variables or via the AWS metadata endpoint. │
│ indexer-0 Validating databases (splunkd validatedb) failed with code '1'.  If you cannot resolve the issue(s) above after consulting documentation, please file a case online at http://www.splunk.com/page/submit_issue 
marcusschiesser commented 1 year ago

Meanwhile, I think the operator project can update the SmartStore doc:

  1. Don't just write IAM roles, this will lead people in the wrong direction as most people will think of pod roles, I assume node roles are working - explicitly document this.
  2. Add a test case that is working from a Splunk pod without modification using Python (see my last comment).
satellite-no commented 1 year ago

Please vote to add this to Splunk Enterprise https://ideas.splunk.com/ideas/EID-I-1730

yaroslav-nakonechnikov commented 1 year ago

subscribing. extremely critical thing.

nathan-bowman commented 1 year ago

Subscribed, this effects me as well. Using AWS IRSA's are the standard now, this should be prioritized.

dw-seanelliott commented 12 months ago

@vivekr-splunk Any idea internally at Splunk when this might get released? https://ideas.splunk.com/ideas/EID-I-1730

dw-seanelliott commented 12 months ago

oh man! This looks like its might be out in Splunk 9.1.1 https://docs.splunk.com/Documentation/Splunk/9.1.1/Indexer/SmartStoresecuritystrategies

vivekr-splunk commented 12 months ago

Hello @dw-seanelliott , yes IRSA is supported in Splunk 9.0.5+ . there are few things we are still working on like private S3 bucket access which should be available in upcoming Splunk release.

raghukumarc commented 1 month ago

Hello @vivekr-splunk , Can you confirm that Smartstore configuration with Splunk-operator on EKS works now. We have been facing certain issues with the STS rsync() failures. Checking with AWS support they confirmed that they are not able to track any of the STS calls within the Cloudtrail logs. Please confirm that IRSA works for smartstore setup within Splunk-operator configuration.