splunk / splunk-operator

Splunk Operator for Kubernetes
Other
209 stars 115 forks source link

App Framework: Allow alternative CA authorities for S3 buckets #1103

Open gjanders opened 1 year ago

gjanders commented 1 year ago

Please select the type of request

Bug

Tell us more

Describe the request

2023-03-06T09:46:18.392485856Z  ERROR   GetAppListFromRemoteBucket      Unable to get apps list {"controller": "clustermanager", "controllerGroup": "enterprise.splunk.com", "controllerKind": "ClusterManager", "C
lusterManager": {"name":"mobiles-cm","namespace":"mobiles"}, "namespace": "mobiles", "name": "mobiles-cm", "reconcileID": "517c5a41-f927-4e6c-9d11-4bd9a72a9753", "name": "mobiles-cm", "namespace": "mobiles", "ap
pSource": "clusterApps", "error": "got an object error: Get \"https://10.x.x.x/k8s_mobile-idx-config/?location=\": x509: certificate signed by unknown authority for bucket: k8s_mobile-iot-idx-config"}

Expected behavior The S3 bucket should be allowed even if a company CA certificate i used on-prem to sign the server (which is common for on-prem object storage).

Splunk setup on K8S Splunk operator 2.2.0

Reproduction/Testing steps

spec:
  appRepo:
    appInstallPeriodSeconds: 90
    appSources:
    - location: clusterApps/
      name: clusterApps
    appsRepoPollIntervalSeconds: 900
    defaults:
      scope: cluster
      volumeName: volume_app_repo_us
    installMaxRetries: 2
    volumes:
    - endpoint: https://10.x.x.x
      name: volume_app_repo_us
      path: k8s_mobile-iot-idx-config/
      provider: minio
      region: us-west-2
      secretRef: mobile-iot-s3-secret
      storageType: s3

K8s environment

k8s cluster

Proposed changes(optional)

Allow self-signed/CA signed certificates from s3 storage provider

K8s collector data(optional) Let me know if you need this

Additional context(optional) If there is a flag or switch I can use to disable the validation please let me know

vivekr-splunk commented 1 year ago

@gjanders we are looking into this issue. do you want us to ignore the certificate validation? let me check if we can set it up as configurable value

gjanders commented 1 year ago

I think that ignoring certificate validation may be the easiest option and I'm happy to use that. An alternative option may to be add corporate CA certificates into the system but I'm happy to disable certificate validation for now....

Thankyou!

gjanders commented 1 year ago

I know this issue was logged very recently, however do you have a timeline or a workaround in the near future?

If not I will have to temporarily avoid the app framework to keep the project moving forward.

Thanks

vivekr-splunk commented 1 year ago

@gjanders I have a solution in my mind. but I need sometime to test it out. even if the solution works we will only be releasing in May first week. If you want to create custom build from code you can always do that just change config in pkg/splunk/client/awss3client.go

config := &aws.Config{
        Region:                        aws.String(region),
        MaxRetries:                    aws.Int(3),
        HTTPClient:                    &httpClient,
        DisableSSL:                    aws.Bool(true),
        LogLevel:                      aws.LogLevel(aws.LogDebug | aws.LogDebugWithHTTPBody | aws.LogDebugWithRequestErrors | aws.LogDebugWithRequestErrors | aws.LogDebugWithSigning),
        Logger:                        aws.NewDefaultLogger(),
        CredentialsChainVerboseErrors: aws.Bool(true),
    }

remember this is not tested but I am hoping this will work.

gjanders commented 1 year ago

Thanks, I'll try to see if I can test this in the near future...

gjanders commented 1 year ago

I forgot to mention that I had switched to the minio provider as using the AWS provider my path ended up on the AWS URL even when I had the endpoint pointing to a local IP address: k8s-config-testing.s3.ap-southeast-2.amazonaws.com

With endpoint: https://10....

Do you have a code snippet for minio I can test?

gjanders commented 1 year ago

I found that I could potentially modify transport.go for minio, on line 72 I added: InsecureSkipVerify: true, However I couldn't determine how to re-package this back into the operator build so after spending some time on it I gave up. I also have attempted to modify the AWS client to use the endpoint I specify but so far without success

vivekr-splunk commented 1 year ago

@gjanders once you download the code its pretty straight forward

> go mod tidy
> make && make generate && make manifests
> make docker-build docker-push IMAGE_TAG_BASE=<docker repo name here>/splunk-operator VERSION=2.2.1 IMG=<docker repo name here>/splunk-operator:2.2.1 SPLUNK_ENTERPRISE_IMAGE=splunk/splunk:9.0.3-a2 

Once the docker image is created you can just use the image in your current installation

gjanders commented 1 year ago

So where I got stuck was getting the custom code into it! InsecureSkipVerify: true,

Worked in transport.go but I've modified the github.com/minio/minio-go package to make it work as expected

However I'm now past the point of SSL errors The AWS fix you mentioned doesn't help as the endpoint doesn't work on-prem...it's designed to use the aws.com URL's only.

gjanders commented 1 year ago

There will definitely be a nice way to do this but I ended up doing:

    17  COPY minio-go/ minio-go/
    18  RUN mkdir -p /usr/local/go/src
    19  RUN cp -R /workspace/minio-go/ /usr/local/go/src
    20  # Build
    21  RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o manager main.go

After adding my update to the transport.go of the minio-go package downloaded by the operator, I've then hit issue https://github.com/splunk/splunk-operator/issues/997 and I've merged that fix into my local copy of the operator as well.

I can now see it has attempted a bundle push so I'm much further along

SebastianDeiss commented 1 year ago

Any update on this? I am facing the same issue with an S3 storage using a private CA.
@vivekr-splunk: adding the private CA via a configuration value would be great.

rsomu commented 1 year ago

Yes, adding a private CA through config map or similar would be a cleaner and a sustainable option.

rsomu commented 1 year ago

As a workaround I was able to get it work by creating a configmap with my self-signed certificate and adding that as a volume into the splunk-operator-controller-manager deployment apps configuration.

kubectl create configmap ca-selfsigned-pemstore --from-file=ca-selfsigned-cert.pem

Here is the excerpt of the config.

        volumeMounts:
        - mountPath: /opt/splunk/appframework/
          name: app-staging
        - name: ca-selfsigned-pemstore
          mountPath: /etc/ssl/certs/ca-selfsigned-cert.pem
          subPath: ca-selfsigned-cert.pem
        ...
      serviceAccountName: splunk-operator-controller-manager
      terminationGracePeriodSeconds: 10
      volumes:
      - configMap:
          name: splunk-operator-config
        name: splunk-operator-config
      - name: ca-selfsigned-pemstore
        configMap:
          name: ca-selfsigned-pemstore

BTW, the provider should be renamed from minio to on-prem to be generic as I am not using MinIO rather Pure Storage FlashBlade in this case. I am surprised all on-prem S3 is grouped into "minio" which is not correct.

SebastianDeiss commented 1 year ago

Awsome. @rsomu: Thanks for sharing your workaround. I agree, that the provider should not be called minio, since there is other on-prem S3 storage too.

gjanders commented 1 year ago

Thankyou for the workaround, that get's me closer but doesn't solve the issue: x509: certificate is valid for , not for bucket:

So that gets me past the SSL chain validation, and now I just need to disable hostname / CN validation.

Would you happen to know how to do that?

In my case we're using a round-robin DNS to point to multiple backend servers, each with a unique SSL cert signed by a common CA.

gjanders commented 10 months ago

We managed to get access to update the SSL certificates of the s3 solution, so I can confirm this workaround has worked for me! The ability to disable SSL validation completely would be a nice to have...

Thanks

hoyosb commented 2 months ago

As a workaround I was able to get it work by creating a configmap with my self-signed certificate and adding that as a volume into the splunk-operator-controller-manager deployment apps configuration.

kubectl create configmap ca-selfsigned-pemstore --from-file=ca-selfsigned-cert.pem

Here is the excerpt of the config.

        volumeMounts:
        - mountPath: /opt/splunk/appframework/
          name: app-staging
        - name: ca-selfsigned-pemstore
          mountPath: /etc/ssl/certs/ca-selfsigned-cert.pem
          subPath: ca-selfsigned-cert.pem
        ...
      serviceAccountName: splunk-operator-controller-manager
      terminationGracePeriodSeconds: 10
      volumes:
      - configMap:
          name: splunk-operator-config
        name: splunk-operator-config
      - name: ca-selfsigned-pemstore
        configMap:
          name: ca-selfsigned-pemstore

BTW, the provider should be renamed from minio to on-prem to be generic as I am not using MinIO rather Pure Storage FlashBlade in this case. I am surprised all on-prem S3 is grouped into "minio" which is not correct.

I'm loving the work that has been done here and was intending to use this solution (Splunk Operator) for our deployments going forward (we're in a disconnected environment, running OpenShift w/ OpenShift data foundations). We've gotten to the point where we're trying to deploy apps and configure the cluster (c3 deployment) but have discovered that there isn't really a way for me to define custom CA certs for services such as the AppFramework which is preventing us from really using the cluster in any meaningful way. I've even tried the above fix and that doesn't seem to fix my issue, as the operator manager still logs the error about the unknown certificate authority. Was there anything else to the workaround mentioned above or are there any updates on the horizon to make this work in disconnected environments with private s3 clusters (whether its minio or something else) and private CAs?

gjanders commented 2 months ago

My deployment for the splunk-operator shows:

        volumeMounts:
        - mountPath: /etc/ssl/certs/ca-selfsigned-cert.pem
          name: ca-selfsigned-pemstore
          subPath: ca-selfsigned-cert.pem
        - mountPath: /opt/splunk/appframework/
          name: app-staging

      volumes:
      - configMap:
          defaultMode: 420
          name: ca-selfsigned-pemstore
        name: ca-selfsigned-pemstore
      - name: app-staging
        persistentVolumeClaim:
          claimName: splunk-operator-app-download

Is that what you see?

/etc/ssl/certs/ca-selfsigned-cert.pem exists in the manager container and shows my company signed SSL certificate chain.

Prior to this solution I built the operator from source and removed the SSL verification but that is much more work than the above solution...

hoyosb commented 2 months ago

I see what I did wrong, when I created my configmap the name of my source file was ca.crt instead of ca-selfsigned-cert.pem. The resulted in the mount for the container simply being an empty directory. Once I updated my configmap things started to work properly. thanks for the help, greatly appreciated!