grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.92k stars 3.45k forks source link

[promtail] support scraping from Kubernetes API #7205

Open barrettj12 opened 2 years ago

barrettj12 commented 2 years ago

Is your feature request related to a problem? Please describe. Promtail supports generating scrape targets using the K8s API (kubernetes_sd_configs). It can get node/pod/container names from the K8s API and use these to generate __path__ for logfiles to scrape. But AFAIK, it can't get the logs themselves from the K8s API.

while Promtail can utilize the Kubernetes API to discover pods as targets, it can only read log files from pods that are running on the same node as the one Promtail is running on.

The usual pattern for k8s containers is stream logs to stdout, then k8s / the container engine will save these for you. The saved logs are then accessible via kubectl logs. It makes sense for Promtail to get logs the same way.

Describe the solution you'd like A new type of scrape_config which directs Promtail to scrape from the Kubernetes API. This would be analogous to e.g. scraping from the journal. This could be used effectively in combination with the existing kubernetes_sd_configs. Something like:

scrape_configs:
  - job_name: k8s-logs
    kubernetes_sd_configs:
      - role: pod
    kubernetes_api:
      namespace: ...
      pod: __meta_kubernetes_pod_name
      container: __meta_kubernetes_pod_container_name
    relabel_configs:
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_name
        target_label: pod
      - action: replace
        source_labels:
          - __meta_kubernetes_pod_container_name
        target_label: container

Describe alternatives you've considered The alternative recommended in the Promtail docs is to use a hostPath volume mount to the logs directory on the host (/var/log/pods and/or /var/log/containers). However, from the k8s docs:

HostPath volumes present many security risks, and it is a best practice to avoid the use of HostPaths when possible.

It would be safer to use the Kube API, considering it already has a mechanism for capturing and serving logs. It also feels kinda hacky mounting /var/log/pods to our Promtail container.

Additional context The Kubernetes API provides a method to read logs. This would be the best way to scrape the logs. I assume kubectl log internally uses this same API endpoint.

dannykopping commented 2 years ago

I really like this idea, and I'm not sure why we opted for reading directly from files initially. This indeed feels hacky & bleeds the abstraction that k8s provides by having to know where it puts its logs.

I suspect we take this approach because of log rotation, though:

Only the contents of the latest log file are available through kubectl logs. For example, if a Pod writes 40 MiB of logs and the kubelet rotates logs after 10 MiB, running kubectl logs returns at most 10MiB of data.

barrettj12 commented 2 years ago

Kubernetes handles the log rotation, so if there are a lot of logs, sure we're gonna lose the old logs. This is not really any different to the file case though, as any serious logging system using files will also have log rotation.

I think it's fine to provide the feature with the proviso that you won't be able to access logs that are already rotated by k8s. If this is really an issue for someone, they can still use the hostPath method.

dannykopping commented 2 years ago

I took a closer look. What you're suggesting will indeed be equivalent to what we're doing now, but making better use of k8s' interfaces.

As of k8s v1.21 the log rotation is at least tuneable, so folks can increase the containerLogMaxSize to write larger log files to ensure promtail doesn't miss anything.

erNail commented 2 years ago

We would also be interested in this feature. Our logs are not stored in the path that promtail generates from the K8s API, but with journald. Scraping the logs directly from the API would resolve this problem.

mhix-valimail commented 1 year ago

I'm also very much interested in this feature. As of Kubernetes v1.24, dockershim is gone, and paths on the node that once existed do not.

cyqui commented 1 year ago

I am also interested and it looks like a great idea. Probably same context as many : got some apps made by third parties and they do not provide "log path" they just write to stdout. next step: setup rotation, redirect to files, collect the files..

davinkevin commented 1 year ago

+1 to get logs from specific pods without access to the FileSystem.

cstyan commented 1 year ago

Hello, thanks for your feature request.

We're currently reevaluating promtails position as a project within Grafana Labs. Internally we're actually using the Agent for both metrics and logs collection at this point.

While we haven't made a formal decision yet, we expect in the near future that all new feature work will be done in the Agent's log collection pipelines rather than in Promtail. I'd suggest opening an issue to request this in their repo, pulling logs directly from the k8s API is a valid use case that should eventually be supported IMO.