openshift / cluster-logging-operator

Operator to support logging subsystem of OpenShift
Apache License 2.0
105 stars 147 forks source link

Restructure 'all-in-one' as its currently defined before 4.0 release #66

Closed jcantrill closed 5 years ago

jcantrill commented 5 years ago

Overview

While working through:

I fundamental believe the approach we are taking to configure split clusters is repeating the same problem we had with the deployer, ansible, and now the operator. Prior to feature freeze for 4.0, we must re-evaluate the current CR as it will become an API we will need to maintain for a while going forward

Issue

We currently treat the split scenario (apps to one cluster, infra to another) as a special case. The implementation depends on an annotation for which we introduce 'if' checks (i.e. elasticsearch case) in multiple places. This is contrary to the advisement we received several releases ago to consider how we might treat these cases as the same but different instance (e.g. class and object metaphor). With regards to applications and operations Elasticsearch stacks (ie. ES, Kibana, curator), there is no difference between the two besides the name. By subtly altering how we represent these use cases in the CR, we can remove the specialty nature of the current design. This should simply the code.

Proposal

This proposal is a variant of one of the alternates listed below. It would introduce an additional hierarchy to group stacks accordingly (allowing additional ones in future if that makes sense), and configure message routing in the collector. This change also would allow us to treat clusters uniformally:

Clusters

apiVersion: "logging.openshift.io/v1alpha1"
kind: "ClusterLogging"
metadata:
  name: "cluster-logging"
spec:
  managementState: "Managed"
  stacks:
      - name: app
      type: elastic
      elastic:
         logStore:
            type: "elasticsearch"
            elasticsearch:
              dataReplication: "NoReplication"
         visualization:
           type: "kibana"
             kibana:
           replicas: 1
         curation:
           type: "curator"
           curator:
             schedule: "30 3 * * *"
     -  name: infra
       type: elastic
       elastic:
          logStore:
            type: "elasticsearch"
            elasticsearch:
              dataReplication: "NoReplication"
         visualization:
           type: "kibana"
             kibana:
           replicas: 1
         curation:
           type: "curator"
           curator:
             schedule: "30 3 * * *"  
...

One could further suggest an additional optimization where since we know the stacks[].type we no longer need component types; we will ALWAYS have the same components in a given cluster type (e.g. Elasticsearch, Kibana, Curator)

apiVersion: "logging.openshift.io/v1alpha1"
kind: "ClusterLogging"
metadata:
  name: "cluster-logging"
spec:
  managementState: "Managed"
  stacks:
    - name: app
      type: elastic
      elastic:
         logStore:
            resources:
              request:
              limits:
            dataReplication: "NoReplication"
         visualization:
            resources:
              request:
              limits:
           replicas: 1
         curation:
            resources:
              request:
              limits:
            schedule: "30 3 * * *"
    -  name: infra
       type: elastic
       elastic:
         logStore:
         visualization:
         curation:
           type: "curator"
           curator:
             schedule: "30 3 * * *"  

What's in a name

Ideally, we would use the name as either the name for all dependent resources or as a suffix to the resources the operator creates (e.g. elasticsearch-infra). Alternatively, we might consider only applying the suffix (as we do now) when there are multiple cluster definitions. Additionally we should consider only supporting the names: apps, infra, since they have special meaning.

Collectors

Initially, message routing would require us make some opinionated assumptions based on the deployed clusters:

chancez commented 5 years ago

👍 to your design. I like it

jcantrill commented 5 years ago

Working example in #67. Still needs some additional testing:

jcantrill commented 5 years ago

Closing as we have decided not to manage multiple clusters in this fashion.