openshift / cluster-logging-operator

Operator to support logging subsystem of OpenShift
Apache License 2.0
105 stars 147 forks source link

Reconsider how collector nodeSelector is defined #58

Closed jcantrill closed 5 years ago

jcantrill commented 5 years ago

The recent submission of a PR to support rsyslog has made me question why we have not taken the opportunity in 4.0 to possibly change our node selector for the collector. Consider the following; Ref: https://github.com/openshift/cluster-logging-operator/blob/master/hack/cr.yaml

...
  collection:
    logCollection:
      type: "fluentd"
      fluentd:
        nodeSelector:
          logging-infra-fluentd: "true"
      rsyslog:
        nodeSelector:
          logging-infra-rsyslog: "true"

Wouldn't it be more reasonable to have a single label which takes a collector:

logging-infra-collector: "fluentd | rsyslog"

Additionally, moving to a 'well-known-label'='known value' pattern would allow us to remove the nodeselector block all together. I do not see a need for a customer to have to modify the selector

Either way allows us to split collectors across nodes, but the later would allow us to add additional collectors without requiring an additional label.

richm commented 5 years ago

The recent submission of a PR to support rsyslog has made me question why we have not taken the opportunity in 4.0 to possibly change our node selector for the collector. Consider the following; Ref: https://github.com/openshift/cluster-logging-operator/blob/master/hack/cr.yaml

...
  collection:
    logCollection:
      type: "fluentd"
      fluentd:
        nodeSelector:
          logging-infra-fluentd: "true"
      rsyslog:
        nodeSelector:
          logging-infra-rsyslog: "true"

Wouldn't it be more reasonable to have a single label which takes a collector:

logging-infra-collector: "fluentd | rsyslog"

Additionally, moving to a 'well-known-label'='known value' pattern would allow us to remove the nodeselector block all together. I do not see a need for a customer to have to modify the selector

Either way allows us to split collectors across nodes, but the later would allow us to add additional collectors without requiring an additional label.

* Why did we decide to continue to use a boolean to identify which nodes will receive the collector?

Inertia

* Additionally, node labeling is now outside the responsibility of this operator correct?  What is the workflow to get alternate collectors landed on nodes?

I have no idea - probably the "node-labeling-operator" whatever that is.

* One could additionally argue that `eventrouter` is a specialized collector. Is there a need to support a variant of landing multiple collectors on the same node?

You can have the eventrouter and fluentd/rsyslog running on the same node. You would typically have only 1 eventrouter per cluster, and it can run on any node - I guess like a deployment with replica restricted to be 1 or less.

ewolinetz commented 5 years ago

The other thing to consider is this -- we don't default that value at all. Whatever is provided by the customer on the CR we will use.

jcantrill commented 5 years ago

The other thing to consider is this -- we don't default that value at all. Whatever is provided by the customer on the CR we will use.

But why does the customer need to provide anything? A label is needed only so the platform can land the collector. Should it matter to the customer whether its the one we define or foo=bar? We should just make it something (probably namespaced even), remove it from the CR, and then better understand how the nodes are labeled. I assume there is no way currently for an admin to specify which nodes get labled

ewolinetz commented 5 years ago

But why does the customer need to provide anything? A label is needed only so the platform can land the collector. Should it matter to the customer whether its the one we define or foo=bar?

Right, but we cannot just know what nodes a customer wants logs collected from. This is one of the cases where we need to be told what they want so it can land where they need.

We should just make it something (probably namespaced even), remove it from the CR, and then better understand how the nodes are labeled. I assume there is no way currently for an admin to specify which nodes get labled

I believe there is a node operator that I imagine would have a way to map a node to labels. I'm not sure what you mean by making something [that is namespaced]...

jcantrill commented 5 years ago

We decided to default select to empty since the desire is to have fluent on all nodes but allow a specification if needed. fixed by #59. Node labeling is the responsibility of the machine config operator and is done according to machine classes. Admins would need to manually label node sets if they desired to only deploy fluent on a subset