vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.29k stars 1.61k forks source link

Using kubelet /pods API to enrich kubernetes_logs metadata #18449

Open nmiculinic opened 1 year ago

nmiculinic commented 1 year ago

Use Cases

Other logging agents, e.g. fluent-bit can use kubelet endpoint to enrich kubernetes metadata for pods: https://docs.fluentbit.io/manual/pipeline/filters/kubernetes#optional-feature-using-kubelet-to-get-metadata

The use case is alleviating the pressure on k8s API server. I run a pretty large installation, and see annotation failures errors (24 errors in last 1w; errors appeared across 5 k8s clusters (there's more of them though))

Proposal

Use kubelet API endpoint to get pod metadata, which would be less effortful on kube API server.

jszwedko commented 1 year ago

Hi @mcasper !

Did you try setting https://vector.dev/docs/reference/configuration/sources/kubernetes_logs/#use_apiserver_cache to true? I believe that will achieve the behavior you are looking for. It's not the default mainly for compatibility reasons since it was added later and does introduce the risk of stale data.

mcasper commented 1 year ago

I think maybe you meant to ping @nmiculinic? 😄

jszwedko commented 1 year ago

Doh, yes, sorry @mcasper !

nmiculinic commented 1 year ago

@jszwedko How does this work? It's not exactly this, since the call still has to go to kube-api server as far as I understand?

jszwedko commented 1 year ago

Ah, you are right, I misunderstood what that option is doing. I think it is still hitting the API server, but just allows it to use cached results. I think you'll still see reduced control plane pressure by using it though. That is the reason it was added (see https://github.com/vectordotdev/vector/issues/16797 for details).