kube-logging / logging-operator

Logging operator for Kubernetes
https://kube-logging.dev
Apache License 2.0
1.54k stars 329 forks source link

Loki output: set tenant header based on k8s label instead of static value #1145

Closed atamgp closed 1 year ago

atamgp commented 1 year ago

Describe the solution you'd like I want to define 1 default Loki ClusterOutput but not with a static tenant. With this, tenants don't get to see the config (it's cluster level) and for 100 tenant's I don't need then to create 100 namespaced output's with the same loki endpoint and auth config with is unsecure and harder to maintain if something changes.

e.g.:

apiVersion: logging.banzaicloud.io/v1beta1
kind: ClusterOutput
metadata:
 name: default
spec:
 loki:
   url: http://loki:3100
   tenant_from_label: "capsule.clastix.io/tenant"
   configure_kubernetes_labels: true
   buffer: ...

Additional context We create multi tenant k8s clusters and use capsule to create Tenant's in the cluster. It's not relevant to know how Capsule works internally, only that namespaces and pods have a certain label with their tenant id.

For logging, I need to extract this label to be forwarded as the tenant header (X-Scope-OrgId) to Loki.

I see in the code that something like this is available for kafka (headers_from_record in the crds):

              kafka:
                properties:
                  ack_timeout:
                    type: integer
...
                  headers_from_record:
                    additionalProperties:
                      type: string
                    type: object

More important: Fluentd also supports this: https://docs.fluentd.org/output/http#headers_from_placeholders

For non-loki users it might be interesting to also add this feature (with a more generic name: headers_from_record) to the http output

atamgp commented 1 year ago

Update:

It seems logging-framework uses the loki output from fluentd. fluentd does not yet support tenant_id_key which fluentbit does support: https://docs.fluentbit.io/manual/v/1.8/pipeline/outputs/loki

I opened a ticket with fluend to also support this, then the logging-framework needs to pass through only? https://github.com/fluent/fluentd/issues/3992

aslafy-z commented 1 year ago

I believe this is already supported by the fluentd loki plugin, see https://grafana.com/docs/loki/latest/clients/fluentd/#tenant.

The tenant field also supports placeholders, so it can dynamically change based on tag and record fields. Each placeholder must be added as a buffer chunk key. The following is an example of setting the tenant based on a k8s pod label:

<match **>
  @type loki
  url "https://logs-prod-us-central1.grafana.net"
  tenant ${$.kubernetes.labels.tenant}
  # ...
  <buffer $.kubernetes.labels.tenant>
    @type memory
    flush_interval 5s
  </buffer>
</match>

Also, the associated logging-operator output field is already passed through to the fluentd configuration: https://github.com/banzaicloud/logging-operator/blob/42402497f2116f014f14ba867fdce13aba0bf171/pkg/sdk/logging/model/output/loki.go#L74

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions!

pepov commented 1 year ago

thanks @aslafy-z for clarifying this! @fekete-robert is there a good place in the docs to add this?

fekete-robert commented 1 year ago

I think we can do two things:

pepov commented 1 year ago

I would opt for the separate page, but first someone should actually try this out whether it works correctly or not. I closed this prematurely, sorry.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions!

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions!