grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.74k stars 3.43k forks source link

[loki-operator] CRD: support podMonitor k8s CRD #7387

Open liguozhong opened 2 years ago

liguozhong commented 2 years ago

Is your feature request related to a problem? Please describe. The promtail service discovery module is implemented with reference to the service discovery of prometheus, which achieves great flexibility, but the configuration is relatively complex and does not conform to the design concept of the k8s environment.

The CRD in k8s is more popular with operater personnel.

So we should implement a podMonitor CRD similar to the prometheus-operator project to simplify the service discovery feature of promtail. And make promtail easier to be integrated into the CICD process in the k8s environment.

server:
 ...
scrape_configs:
  - job_name: kubernetes-pods-name
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels:
          - __meta_kubernetes_pod_label_name
        target_label: __service__
      - source_labels:
          - __meta_kubernetes_pod_node_name
        target_label: __host__
      - action: drop
        regex: ''
        source_labels:
          - __service__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        replacement: $1
        separator: /
        source_labels:
          - __meta_kubernetes_namespace
          - __service__
        target_label: job
      - action: replace
        source_labels:
          - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_name
        target_label: pod
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_container_name
        target_label: container
      - replacement: /var/log/pods/*$1/*.log
        separator: /
        source_labels:
          - __meta_kubernetes_pod_uid
          - __meta_kubernetes_pod_container_name
        target_label: __path__

Describe the solution you'd like CRD is simple to read 👍

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-app
spec:
  selector:
    matchLabels:
      app: example-app
  namespaceSelector:
    any: true

Describe alternatives you've considered https://github.com/grafana/loki/pull/7247 introduce reload endoint / signal to promtail. This PR will be useful when the CRD applied.

logging-operator: https://banzaicloud.com/docs/one-eye/logging-operator/configuration/flow/ Logging operator for Kubernetes based on Fluentd and Fluent-bit. not promtail. if Fluentd and Fluent-bit has some bug, I dont know how to fix it .

apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
  name: flow-sample
  namespace: default
spec:
  filters:
    - parser:
        remove_key_name_field: true
        parse:
          type: nginx
    - tag_normaliser:
        format: ${namespace_name}.${pod_name}.${container_name}
  localOutputRefs:
    - s3-output
  match:
    - select:
        labels:
          app: nginx

https://github.com/banzaicloud/logging-operator image

Additional context Add any other context or screenshots about the feature request here. podmonitor design: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/design.md#podmonitor podmonitors link: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/crds/crd-podmonitors.yaml

https://github.com/grafana/loki/issues/7387#tasklist-block-5027866a-e326-4c77-beef-b8c3e7a91ecf

liguozhong commented 2 years ago

help wanted @periklis , loki-operator module

liguozhong commented 2 years ago

“Check out the Grafana Agent. PodLogs CRD is performing the functionality you are looking for. https://grafana.com/docs/agent/latest/operator/custom-resource-quickstart/

thanks Joel Verezhak. slack

liguozhong commented 2 years ago

I took a deep look at the grafana-agent code https://github.com/grafana/agent/blob/main/pkg/logs/logs.go, which is actually promtail. It's pretty cool.

so the question is, if a promtail user only collects logs, is it necessary to replace it with grafana-agent for the CRD feature.

periklis commented 2 years ago

@liguozhong as per design the Loki operator is not supporting/managing promtail or any log collector. We believe that the Loki Operator is better serving Loki-only installations. This is so because collector have so many config variants and forthcoming trends like OTEL make it hard to specialize on one.

Saying this, I totally see the value in a promtail CRD with variants like PodMonitor and ServiceMonitor. I can only see this happening in separate operator though

liguozhong commented 2 years ago

@periklis hi , Thanks for the professional advice you shared. so generally what we need is a promtail-operator, rather than continuing to expand on the basis of loki-operator, which will distract loki-operator.

So the question becomes, is it possible for us to develop another promtail-operator to support the PodMonitor CRD. There are already so many log crds,Grafana Agent, logging-operator ...

Is it possible to have a promtail-operator SIG build this component independently first. This will not distract loki and loki-operator maintainers too much time.

periklis commented 2 years ago

From experience wring a couple operators till today writing a dedicated promtail-operator should be an easy task. My advice though is to spent some time designing and crafting the various CRDs you need for that purpose. AFAICS above you propose two to three CRDs, a promtail CRD for the deamonset, a PodMonitor to reconcile scrape configs, a ServiceMonitor to reconcile scrape configs. Even starting with two of them, e.g. promtail + servicemonitor/podmonitor, I would advise:

  1. Start with a draft of Promtail CRD and eliminate obsolete promtail config field you won't expose anyway.
  2. Work out the contradictory config fields, e.g. by make fields enums where possible
  3. Work out the status field for the Promtail CRD to inform users about error/degraded conditions
  4. Work out how you inject credentials, certificates, tokens, CAs via LocalObjectReference ConfigMap/Secret in your Promtail CRD.
  5. Think about how you express tenants and what this means about fields in your CRD.

I am glad to help you with reviews on design even on markdown first.

liguozhong commented 2 years ago

thanks! @periklis I am very happy that the loki community intends to support the podMonitor CRD, which will make loki even more successful in the cloud-native era. So that more users of the original prometheus can easily monitor logs.

I'd be happy to implement a preliminary version of the promtail+ podMonitor CRD for the community to get an initial progress.

periklis commented 2 years ago

I recommend to add @cyriltovena @owen-d and @slim-bean to the discussion here. I agree that promtail needs a separate small operator but as said above the competition with so many other log collection offerings , e.g. OTEL, OpenShift's ClusterLoggingOperator, to name some is a thing to consider. I would not object having an operator for promtail/grafana-agent that takes care of the log collection/normalization parts.

verejoel commented 2 years ago

@liguozhong I would be very happy to contribute to the Promtail operator if I can in any way! Have a big interest from my side in enabling this

liguozhong commented 2 years ago

@liguozhong I would be very happy to contribute to the Promtail operator if I can in any way! Have a big interest from my side in enabling this

oh, that's great, looking forward to your contribution. Before writing code, we need to wait for more loki maintainers to participate in issue discussions to decide whether we need promtail-operator. if not ...