jcantrill commented 5 years ago

Overview

While working through:

standing up cluster-logging
configuration options
documentation
reviewing code

I fundamental believe the approach we are taking to configure split clusters is repeating the same problem we had with the deployer, ansible, and now the operator. Prior to feature freeze for 4.0, we must re-evaluate the current CR as it will become an API we will need to maintain for a while going forward

Issue

We currently treat the split scenario (apps to one cluster, infra to another) as a special case. The implementation depends on an annotation for which we introduce 'if' checks (i.e. elasticsearch case) in multiple places. This is contrary to the advisement we received several releases ago to consider how we might treat these cases as the same but different instance (e.g. class and object metaphor). With regards to applications and operations Elasticsearch stacks (ie. ES, Kibana, curator), there is no difference between the two besides the name. By subtly altering how we represent these use cases in the CR, we can remove the specialty nature of the current design. This should simply the code.

Proposal

This proposal is a variant of one of the alternates listed below. It would introduce an additional hierarchy to group stacks accordingly (allowing additional ones in future if that makes sense), and configure message routing in the collector. This change also would allow us to treat clusters uniformally:

Clusters

apiVersion: "logging.openshift.io/v1alpha1"
kind: "ClusterLogging"
metadata:
  name: "cluster-logging"
spec:
  managementState: "Managed"
  stacks:
      - name: app
      type: elastic
      elastic:
         logStore:
            type: "elasticsearch"
            elasticsearch:
              dataReplication: "NoReplication"
         visualization:
           type: "kibana"
             kibana:
           replicas: 1
         curation:
           type: "curator"
           curator:
             schedule: "30 3 * * *"
     -  name: infra
       type: elastic
       elastic:
          logStore:
            type: "elasticsearch"
            elasticsearch:
              dataReplication: "NoReplication"
         visualization:
           type: "kibana"
             kibana:
           replicas: 1
         curation:
           type: "curator"
           curator:
             schedule: "30 3 * * *"  
...

One could further suggest an additional optimization where since we know the stacks[].type we no longer need component types; we will ALWAYS have the same components in a given cluster type (e.g. Elasticsearch, Kibana, Curator)

apiVersion: "logging.openshift.io/v1alpha1"
kind: "ClusterLogging"
metadata:
  name: "cluster-logging"
spec:
  managementState: "Managed"
  stacks:
    - name: app
      type: elastic
      elastic:
         logStore:
            resources:
              request:
              limits:
            dataReplication: "NoReplication"
         visualization:
            resources:
              request:
              limits:
           replicas: 1
         curation:
            resources:
              request:
              limits:
            schedule: "30 3 * * *"
    -  name: infra
       type: elastic
       elastic:
         logStore:
         visualization:
         curation:
           type: "curator"
           curator:
             schedule: "30 3 * * *"

What's in a name

Ideally, we would use the name as either the name for all dependent resources or as a suffix to the resources the operator creates (e.g. elasticsearch-infra). Alternatively, we might consider only applying the suffix (as we do now) when there are multiple cluster definitions. Additionally we should consider only supporting the names: apps, infra, since they have special meaning.

Collectors

Initially, message routing would require us make some opinionated assumptions based on the deployed clusters:

Single cluster: all messages route here
Multiple clusters: app logs -> app, infra -> infra In future we could introduce a way to define where messages are routed but intentionally absent here.
```
apiVersion: "logging.openshift.io/v1alpha1"
kind: "ClusterLogging"
metadata:
name: "cluster-logging"
spec:
collection:
logCollection:
  type: "fluentd"
  fluentd:
    nodeSelector:
      logging-infra-fluentd: "true"
```
Alternates

Multiple CRs, one for each cluster

Ref: https://gist.github.com/jcantrill/4a9365170f32f72ed57c83f6bb566b4f#file-gistfile1-txt-L27

Cons
Requires cluster admin to 'wire' logs from collector various destinations.
No inherent relations between multiple CRs/clusters on a single 'cluster logging' setup
Single CR with names sources

https://gist.github.com/chancez/6f326e68412dbe760aeffd2be7ea5adf

Cons
Introduces named clusters in away that is limiting (e.g infraLog, appLog)

chancez commented 5 years ago

👍 to your design. I like it

jcantrill commented 5 years ago

Working example in #67. Still needs some additional testing:

Only deploy stack component when present
Override destination (e.g. infra off cluster)

jcantrill commented 5 years ago

Closing as we have decided not to manage multiple clusters in this fashion.

openshift / cluster-logging-operator

Restructure 'all-in-one' as its currently defined before 4.0 release #66

Overview

Issue

Proposal

Clusters

What's in a name

Collectors

Alternates

Multiple CRs, one for each cluster

Cons

Single CR with names sources

Cons