stackabletech / spark-k8s-operator

Operator for Apache Spark-on-Kubernetes for Stackable Data Platform
https://stackable.tech
Other
47 stars 2 forks source link

History Server spark.hadoop.fs.s3a.endpoint conf bug #315

Closed johnfitzy closed 7 months ago

johnfitzy commented 7 months ago

Affected version

23.11.0

Current and expected behavior

When setting url on S3 connection http is appended to the url in spark-defaults.conf file. For example:

spark.hadoop.fs.s3a.endpoint http://s3.me.org:443
spec:
  image:
    productVersion: 3.5.0
  logFileDirectory:  
    s3:
      prefix: eventlogs/  
      bucket:  
        inline:
          bucketName: spark-logs
          connection:
            inline:
              host: s3.me.org
              port: 443
              accessStyle: Path
              credentials:
                secretClass: history-credentials-class

Perhaps I've misunderstood a configuration? I tried adding the tls object but that did not change anything

Possible solution

No response

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

maybe

razvan commented 7 months ago

Hey, thank you for reporting this.

According to the integration tests, this should work as expected. A working version of S3 with TLS for the log directory would look like this (the objects are not inlined in this case):

---
apiVersion: s3.stackable.tech/v1alpha1
kind: S3Connection
metadata:
  name: spark-history-s3-connection
spec:
  host: eventlog-minio
  port: 9000
  accessStyle: Path
  credentials:
    secretClass: history-credentials-class
  tls:
    verification:
      server:
        caCert:
          secretClass: minio-tls-eventlog
---
apiVersion: s3.stackable.tech/v1alpha1
kind: S3Bucket
metadata:
  name: spark-history-s3-bucket
spec:
  bucketName: spark-logs
  connection:
    reference: spark-history-s3-connection
---
apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
  name: minio-tls-eventlog
spec:
  backend:
    k8sSearch:
      searchNamespace:
        pod: {}
---
apiVersion: v1
kind: Secret
metadata:
  name: minio-tls-eventlog
  labels:
    secrets.stackable.tech/class: minio-tls-eventlog
data:
  ca.crt: todo
  tls.crt: todo
  tls.key: todo
---
apiVersion: secrets.stackable.tech/v1alpha1
kind: SecretClass
metadata:
  name: history-credentials-class
spec:
  backend:
    k8sSearch:
      searchNamespace:
        pod: {}
---
apiVersion: v1
kind: Secret
metadata:
  name: history-credentials
  labels:
    secrets.stackable.tech/class: history-credentials-class
stringData:
  accessKey: spark
  secretKey: sparkspark

Also looking at the code, the protocol is not selected based on port but on the tls property of the bucket's connection.

HTH.

johnfitzy commented 7 months ago

Edit, this work, my bad. Thanks.