stackabletech / issues

This repository is only for issues that concern multiple repositories or don't fit into any specific repository
2 stars 0 forks source link

Generic Database connection design #238

Open sbernauer opened 2 years ago

sbernauer commented 2 years ago

We want a generic way of for database connections like mysql, postgres, mongodb

This is done when

### Tasks
- [ ] https://github.com/stackabletech/documentation/pull/321
- [ ] https://github.com/stackabletech/druid-operator/issues/352
- [ ] https://github.com/stackabletech/hive-operator/issues/148

Please note that this list only includes the products that currently define the credentials within the Product CRD (which is very bad). There are other operators, which should be adopted to the new connection design as well for consistency.

Current Status

Airflow

Druid

HBase

Hadoop HDFS

Hive

Kafka

NiFi

OPA

Spark

Superset

Trino

ZooKeeper

These are my findings. I hope this is complete. Does not look as bad or as much as expected.

fhennig commented 1 year ago

@razvan I'm not sure this can be closed, there are still open tasks

razvan commented 1 year ago

It was closed automatically by merging the ADR PR. I missed the comment.

Maleware commented 1 year ago

As long as this is not implemented you can workaround with Podovrerrites e.g. like this:

To avoid searching, this is the line added to args: sed -i "s|XX_PASS_XX|$(cat /tmp/postgres-password/password)|g" /stackable/config/hive-site.xml

---
apiVersion: hive.stackable.tech/v1alpha1
kind: HiveCluster
metadata:
  name: hive
  namespace: my-namespace
spec:
  image:
    productVersion: 3.1.3
    stackableVersion: "23.7.0"
  clusterConfig:
    database:
      connString: jdbc:postgresql://my-postgress-connector/hive
      user: hive
###### This is what will be taken from a secret ########
      password: XX_PASS_XX
      dbType: postgres
    s3:
      reference: ionos
  metastore:
    config:
      resources:
        memory:
          limit: '1Gi'
####### Podoverrides to mount the volume and sed the pw from the secret
    podOverrides:
      spec:
        containers:
          - name: hive
            args:
              - echo copying /stackable/mount/config to /stackable/config && cp -RL /stackable/mount/config/* /stackable/config && echo copying /stackable/mount/log-config/hive-log4j2.properties to /stackable/config/hive-log4j2.properties && cp -RL /stackable/mount/log-config/hive-log4j2.properties /stackable/config/hive-log4j2.properties && keytool -importkeystore -srckeystore /etc/pki/java/cacerts -srcstoretype jks -srcstorepass changeit -destkeystore /stackable/truststore.p12 -deststoretype pkcs12 -deststorepass changeit -noprompt && echo replacing xxx_access_key_xxx and xxx_secret_key_xxx with secret values. && sed -i "s|xxx_access_key_xxx|$(cat /stackable/secrets/accessKey)|g" /stackable/config/hive-site.xml && sed -i "s|xxx_secret_key_xxx|$(cat /stackable/secrets/secretKey)|g" /stackable/config/hive-site.xml && sed -i "s|XX_PASS_XX|$(cat /tmp/postgres-password/password)|g" /stackable/config/hive-site.xml && bin/start-metastore --config /stackable/config --db-type postgres --hive-bin-dir bin
            volumeMounts:
              - mountPath: /tmp/postgres-password/
                name: postgres-password
        volumes:
          - name: postgres-password
            secret:
              secretName: hive-postgres-secret
      default:
        replicas: 1
ruslanguns commented 9 months ago
  • echo copying /stackable/mount/config to /stackable/config && cp -RL /stackable/mount/config/* /stackable/config && echo copying /stackable/mount/log-config/hive-log4j2.properties to /stackable/config/hive-log4j2.properties && cp -RL /stackable/mount/log-config/hive-log4j2.properties /stackable/config/hive-log4j2.properties && keytool -importkeystore -srckeystore /etc/pki/java/cacerts -srcstoretype jks -srcstorepass changeit -destkeystore /stackable/truststore.p12 -deststoretype pkcs12 -deststorepass changeit -noprompt && echo replacing xxx_access_key_xxx and xxx_secret_key_xxx with secret values. && sed -i "s|xxx_access_key_xxx|$(cat /stackable/secrets/accessKey)|g" /stackable/config/hive-site.xml && sed -i "s|xxx_secret_key_xxx|$(cat /stackable/secrets/secretKey)|g" /stackable/config/hive-site.xml && sed -i "s|XX_PASS_XX|$(cat /tmp/postgres-password/password)|g" /stackable/config/hive-site.xml && bin/start-metastore --config /stackable/config --db-type postgres --hive-bin-dir bin

I would like to comment that this workaround has worked perfectly for me. Although I would like to understand a little bit how it works underneath., as I can appreciate that there is more to it than just replacing the database password with the placeholder, @Maleware could you comment on why?

razvan commented 9 months ago

Hive expects the password in hive-site.xml. We have a policy of not reading product related secrets in the operators. So we put a placeholder in hive-site.xml and use sed to replace it in the container.

When we have implemented this ADR, the mechanism will hopefully be a bit more elegant and transparent. Until then, this is all there is to it.

Maleware commented 9 months ago

Hey @ruslanguns hope this reaches you well.

Yeah indeed, that's a fix we brought up for a customer for the time being.

So for the part as a example sed -i "s|XX_PASS_XX|$(cat /tmp/postgres-password/password)|g" /stackable/config/hive-site.xml. Others behave similar.

This doing nothing else but propagating the password from postgres-password/password ( path to a mounted secret volume ) and seding it into hive-site.xml. This is because while on gitops, you want password security and manage passwords as part of your IaC infrastructure. That's the part I've done for them and the one you interfered already correctly!

Anyway, some other commands are about to set up the keystore ( build it from scratch ) and set explicitly which type of database you are looking for. @sbernauer is a way more thorough expert then I am on this

ruslanguns commented 9 months ago

Now I understand, so this script partly does the replacements that the same operator usually does in the hive-site.xml file including not just the XX_PASS_XX placeholder but also the base config and the use of the S3 credentials placeholders as well, now it all makes a lot of more sense to me. Thanks to both of you for the explanation, really appreaciated. @razvan & @Maleware

We use GitOps with FluxCD and indeed, this workaround has allowed us not to expose passwords in the repository in an insecure way.

apiVersion: hive.stackable.tech/v1alpha1
kind: HiveCluster
metadata:
  name: hive
  namespace: infrastructure
spec:
  image:
    productVersion: 3.1.3
  clusterConfig:
    database:
      connString: jdbc:postgresql://postgres-databases.infrastructure.svc.cluster.local:5432/hive
      user: hive_owner_user
      password: __PASS__
      dbType: postgres
    s3:
      reference: spark-s3-connection
  metastore:
    roleGroups:
      default:
        replicas: 1
    podOverrides:
      spec:
        containers:
        - name: hive
          args:
          - |
            echo "Copying configuration from /stackable/mount/config to /stackable/config"
            cp -RL /stackable/mount/config/* /stackable/config

            echo "Copying Hive log4j2 properties file from /stackable/mount/log-config to /stackable/config"
            cp -RL /stackable/mount/log-config/hive-log4j2.properties /stackable/config/hive-log4j2.properties

            echo "Importing Java keystore to PKCS12 format for truststore"
            keytool -importkeystore \
              -srckeystore /etc/pki/java/cacerts \
              -srcstoretype jks -srcstorepass changeit \
              -destkeystore /stackable/truststore.p12 \
              -deststoretype pkcs12 \
              -deststorepass changeit -noprompt

            echo "Replacing placeholders with secret values in hive-site.xml"
            sed -i "s|xxx_access_key_xxx|$(cat /stackable/secrets/accessKey)|g" /stackable/config/hive-site.xml
            sed -i "s|xxx_secret_key_xxx|$(cat /stackable/secrets/secretKey)|g" /stackable/config/hive-site.xml
            sed -i "s|__PASS__|$(cat /tmp/postgres-password/password)|g" /stackable/config/hive-site.xml

            echo "Starting Hive Metastore"
            bin/start-metastore --config /stackable/config --db-type postgres --hive-bin-dir bin

          volumeMounts:
          - mountPath: /tmp/postgres-password/
            name: postgres-password
        volumes:
        - name: postgres-password
          secret:
            secretName: hive-owner-user.postgres-databases.credentials.postgresql.acid.zalan.do