stackabletech / issues

This repository is only for issues that concern multiple repositories or don't fit into any specific repository
2 stars 0 forks source link

Consolidate encryption, authentication and authorization in operators #293

Closed maltesander closed 1 year ago

maltesander commented 1 year ago

After a discussion with @lfrancke we want to consolidate the way we handle encryption, authentication and authorization. This means having a unified global structure across all operators.

This will result in breaking changes for the released versions of ZooKeeper and Superset according to the feature tracker.

Problem

1. Mixing encryption and authentication

Currently Kafka and Zookeeper mix TLS encryption and authentication as follows (see the client_authentication part):

pub struct GlobalKafkaConfig {
    /// Only affects client connections. This setting controls:
    /// - If TLS encryption is used at all
    /// - Which cert the servers should use to authenticate themselves against the client
    /// Defaults to `TlsSecretClass` { secret_class: "tls".to_string() }.
    #[serde(
        default = "tls_secret_class_default",
        skip_serializing_if = "Option::is_none"
    )]
    pub tls: Option<TlsSecretClass>,
    /// Only affects client connections. This setting controls:
    /// - If clients need to authenticate themselves against the server via TLS
    /// - Which ca.crt to use when validating the provided client certs
    /// Defaults to `None`
    #[serde(skip_serializing_if = "Option::is_none")]
    pub client_authentication: Option<ClientAuthenticationClass>,
    /// Only affects internal communication. Use mutual verification between Kafka nodes
    /// This setting controls:
    /// - Which cert the servers should use to authenticate themselves against other servers
    /// - Which ca.crt to use when validating the other server
    #[serde(
        default = "tls_secret_class_default",
        skip_serializing_if = "Option::is_none"
    )]
    pub internal_tls: Option<TlsSecretClass>,
}

This was ok for now since we did not support any other authentication methods.

Now if we want to support different authentication methods e.g. LDAP, this will get a little messy.

2. No global top level config

Other products do not use a "global" config. Authentication (LDAP) resides in the top level, OPA discovery configmap (authorization) as well. Example from superset:

pub struct SupersetClusterSpec {
    /// Emergency stop button, if `true` then all pods are stopped without affecting configuration (as setting `replicas` to `0` would)
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub stopped: Option<bool>,
    /// Desired Superset version
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub version: Option<String>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub statsd_exporter_version: Option<String>,
    pub credentials_secret: String,
    pub mapbox_secret: Option<String>,
    #[serde(default)]
    pub load_examples_on_init: Option<bool>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub authentication_config: Option<SupersetClusterAuthenticationConfig>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
...
}

This should be consistent across all operators.

Proposed solution

Therefore we propose the following clean "common" operator config (we need to discuss what we could move to operator-rs), where we clearly separate encryption, authentication and authorization:

version: ....
clusterConfig:
  tls/encryption:
    client: secret_class...
    internal: secret_class
  authentication: 
    (m)Tls: authentication_class
    ldap: authentication_class
  # or
  authentication: 
    - authentication_class (tls)
    - authentication_class (ldap)
  authorization: 
    opa:
      configmap: ....

Outcome

We decided to use the following spec (example represents mostly Trino/Superset, mixed with random fields). This should be properly refined in the sub tickets for each operator.

apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCluster
metadata:
  name: simple-trino
spec:
  version: 396-stackable0.1.0
  clusterConfig:
    catalogLabelSelector:
      matchLabels:
        trino: simple-trino 
    # Just copy paste existing structure 
    # In case of Trino (easiest)
    authentication:
      method:
        ldap: ...
    # We recommend the following list of authenticationClasses
    # If this recommendation does not match a particular product (e.g. no authentication at all) it can do it differently
    # (1) In case of Trino (intermediate solution as it is quicker to implement)
    authentication: # TODO call it clientAuthentication?
      - authenticationClass: authentication_class (tls)
      - authenticationClass: authentication_class (ldap)
      - multiUser: <secret-name>
    # (2) OR in case of Trino (final solution)
    authentication:
      - authenticationClass: trino-ldap-authentication-class (tls) # String
      - authenticationClass: authentication_class (ldap)
      - authenticationClass: authentication_class (static)
    # (1) In case of Superset (solution that is quicker to implement)
    authentication:
      authenticationClass: my-ldap (ldap)
      userRegistration: true
      userRegistration_role: Admin
      syncRolesAt: Login
    # (2) OR in case of Superset (alternative solution)
    authentication:
      - authenticationClass: my-ldap (ldap)
        userRegistration: true
        userRegistration_role: Admin
    authorization: # TODO define
    # Maybe a AuthorizationClass (probably not)?
    # How should we handle group memberships?
    # E.g. for LDAP give the option of providing a LDAP AuthenticationClass to retrieve ldap group membership?

    # For now we want to start with the following opa section for all the products that currently support opa
    # Individual struct in every operator, no operator-rs struct
      opa:
        configmap: my-opa
        package: simple-trino

    # We don't want to start with an operator-rs struct as we have products that don't support tls at all,
    # some only server tls and some server + internal tls. We can later on non-breaking move it to a operator-rs
    # struct if we want
    tls:
      serverSecretClass: secret_class # Option<String>. *In general* defaults to "tls"
      internalSecretClass: secret_class # Option<String>. *In general* defaults to "tls"
    # Not Trino/Superset related 
    zookeeperConfigMapName: xyz  
    deepStorage: ...
  coodinators: ....  

Since all operators are currently inconsistent, we decided to implement a final approach for the Druid operator. Druid because it has a clean slate regarding TLS and Authentication and is able to use OPA as well.

The final approach (concerning authentication) are the fields annotated with (2). This is our most sophisticated approach and therefore will be prototyped on Druid.

Furthermore all top level configuration parameters except version/image (Deepstorage, S3, Discovery configmaps etc.) should be moved under the clusterConfig top level.

Tasks

Note: Github Taskslist may break the order, check before pulling new tasks in!

### Tasks
- [x] https://github.com/stackabletech/operator-rs/issues/494
- [x] https://github.com/stackabletech/druid-operator/issues/6
- [x] https://github.com/stackabletech/druid-operator/issues/144
- [x] https://github.com/stackabletech/zookeeper-operator/issues/596
- [x] https://github.com/stackabletech/kafka-operator/issues/529
- [ ] https://github.com/stackabletech/hdfs-operator/issues/289
- [ ] https://github.com/stackabletech/hbase-operator/issues/329
- [x] https://github.com/stackabletech/hive-operator/issues/291
- [x] https://github.com/stackabletech/airflow-operator/issues/247
- [x] https://github.com/stackabletech/trino-operator/issues/395
- [x] https://github.com/stackabletech/trino-operator/issues/401
- [x] https://github.com/stackabletech/superset-operator/issues/339
- [ ] https://github.com/stackabletech/nifi-operator/pull/417
- [ ] https://github.com/stackabletech/airflow-operator/issues/298
- [ ] https://github.com/stackabletech/nifi-operator/issues/485

Followups

maltesander commented 1 year ago

Current status

Airflow

Druid

HBase

Hadoop HDFS

Hive

Kafka

NiFi

Currently:

authentication:
  method:
    authenticationClass: ldap

Should be:

authentication:
  - authenticationClass: ldap

OPA

Spark

not applicable?

Superset

Trino

ZooKeeper

adwk67 commented 1 year ago

Minor addition to the above: the airflow secret also contains credentials for a Redis connection (this is needed when using the CeleryExecutor - currently the only implementation, soon to be augmented by a KubernetesExecutor - as it queues the jobs to be distributed to the workers). This connection can/should also be removed from the secret, though it's not strictly a database connection. Maybe rename "Database connection" to "Sink Connection", "Backend connection" etc.?