Open fullykubed opened 6 months ago
Hey @fullykubed thanks for the detailed description!
You're writing about "native sidecars", which makes me think that you're saying those sidecars are not injected during runtime. But your log output seems to contain lots of linkerd-proxy
containers, which afaik are injected? I might misunderstand how linkerd works, sorry for my limited experience with it.
If this is a case of VPA not providing recommendations for injected sidecar containers: this is a feature, not a bug. In https://github.com/kubernetes/autoscaler/issues/5617 we have a more detailed discussion around why that feature was built and how we could help people who decide that they do want VPA managing their injected containers.
I propose we take our discussion there and close this ticket. Feel free to re-open with additional information, in case I'm misunderstand what you described.
/close
@voelzmo: Closing this issue.
Thanks for the feedback and pointing me to the discussion @voelzmo .
For further context here, the "native sidecar" functionality I am referencing is a new Kubernetes feature that was enabled by default in 1.29. These sidecars are still injected but they run as init containers rather than normal containers. Docs here.
Based on the discussion you linked, it makes sense why they (or normal sidecars) wouldn't be tracked. However, these error logs started to appear in abundance only after starting to use the native sidecar functionality. I think that means something unexpected is happening in the recommender here as it would seem odd to generate 100+ of these log lines per minute on v=1, but I will defer to you about whether this is the intended behavior.
Hey @fullykubed thanks for the docs on native sidecars, this was something I apparently missed entirely! Understanding now that this is about long-running init containers, I understand why you're seeing the errors you're describing this often: VPA only looks at regular containers only when creating its internal data structures, which results in issues later on when adding the metrics.
Would evicting and recreating the Pod be something that you'd want in order to get better resource recommendations for those native sidecars? Or is this something that VPA should ignore, just like it does now, but without polluting the log with error messages?
/reopen /title Native sidecars lead to huge amounts of errors in recommender logs
@voelzmo: Reopened this issue.
/retitle Native sidecars lead to huge amounts of errors in recommender logs
/triage accepted
Hi @voelzmo , no worries!
My initial goal was to have these sidecars tracked and then cause pod evictions when appropriate just as with normal containers.
I am having a little trouble understanding whether this is possible based on #5617. I am noticing that the sidecars are not in the vpaObservedContainers
annotation. However, based on your comment here it seems like they should be as the VPA mutating webhook runs after the linkerd sidecar injector (my understanding is that they run in alphabetical order). What seems even odder is that even though they aren't "observed," for some reason they are still generating these error messages.
If it isn't possible yet, it would be nice to at least not generate the error messages.
I think right now, VPA doesn't even look at anything defined in initContainers
and just looks at the containers
section of the spec. Therefore, all the mechanisms around the annotation I mentioned above don't work for native sidecars.
We now have two options
initContainers
when adding them, such that we don't spam KeyErrors
all over the placeinitContainers
)The second option might be much harder to achieve, so I guess for now it would make sense to merge a small change taking care of the KeyErrors
and then think about if it makes sense to also support native sidecars.
I agree with that assessment!
I do think that as long-running initContainers
are going to become the norm for users of service meshes, this change in the ecosystem will cause a fairly impactful regression in the utility of the VPA.
As I lack the historical context, is there a reason that initContainers
are ignored? It looks like tracking initContainers has been requested a couple times over the years, but never had any definitive action taken.
Taking a quick look at the internals, it seems like the VPA already accounts for containers in pods that are not currently running, but perhaps there is another blocker to this functionality I am not considering?
I'd be interested in helping to bring this to fruition including building the PoC code. What would be the best way to propose and get approval for such an enhancement?
Also note that when using the prometheus history provider, long-running initContainers
are are a part of the returned metrics and get added to the VPA object as recommendations, but they do not get applied.
This doesn't cause any errors, but it can be confusing to see a recommendation but not see it applied.
/area vertical-pod-autoscaler
Which component are you using?:
vertical-pod-autoscaler
What version of the component are you using?:
Component version: 1.0.0
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
EKS
What did you expect to happen?:
Native sidecars (initContainer with RestartPolicy of Always) should have metric samples collected and applied
What happened instead?:
The recommender generates many errors when the pod contains a native sidecar:
The metrics are not recorded in the VPACheckpoints.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
No