kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.05k stars 3.97k forks source link

Vertical pod autoscaler does not generate a VPA object for VM-Operator #5836

Closed ajaykumarmandapati closed 1 year ago

ajaykumarmandapati commented 1 year ago

Which component are you using?

vertical-pod-autoscaler

What version of the component are you using?:

application-version: 0.11.0 , helm-chart version: 5.2.1

Component version:

What k8s version are you using (kubectl version)?:

v1.24.13-eks-0a21954

What environment is this in?:

Amazon EKS

What did you expect to happen?:

VPA generates a valid VPA object for the deployment created by the VM-operator.

What happened instead?: We seem to have an issue with using VerticalPodAutoscaler with VM-Operator , the error that we see for the VPA is Message: The targetRef controller has a parent but it should point to a topmost well-known or scalable controller. Digging deeper we believe it's because of the scale sub-resource present in the CRD, if we manually patch the CRD and get rid of the scale sub-resource then the VPA is able to recognize and provide recommendations.

How to reproduce it (as minimally and precisely as possible):

Create a vm-agent utlising the vm-operator additional create a VPA for this, this would then fail to create a valid VPA object with the below error:

Message: The targetRef controller has a parent but it should point to a topmost well-known or scalable controller.

Anything else we need to know?: NA

ajaykumarmandapati commented 1 year ago
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling.k8s.io/v1","kind":"VerticalPodAutoscaler","metadata":{"annotations":{},"name":"vmagent-application","namespace":"victoria"},"spec":{"resourcePolicy":{"containerPolicies":[{"containerName":"vmagent","maxAllowed":{"cpu":1}}]},"targetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"vmagent-application"},"updatePolicy":{"updateMode":"Auto"}}}
  creationTimestamp: "2023-05-30T07:19:01Z"
  generation: 5
  name: vmagent-application
  namespace: victoria
  resourceVersion: "98510"
  uid: e47a9fa1-2a78-4ead-92b4-acb60671226f
spec:
  resourcePolicy:
    containerPolicies:
    - containerName: vmagent
      maxAllowed:
        cpu: 1
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: vmagent-application
  updatePolicy:
    updateMode: Auto
status:
  conditions:
  - lastTransitionTime: "2023-05-30T07:19:39Z"
    message: The targetRef controller has a parent but it should point to a topmost
      well-known or scalable controller
    status: "True"
    type: ConfigUnsupported
  - lastTransitionTime: "2023-05-30T07:19:39Z"
    message: No pods match this VPA object
    reason: NoPodsMatched
    status: "True"
    type: NoPodsMatched
  - lastTransitionTime: "2023-05-30T07:19:39Z"
    message: No pods match this VPA object
    reason: NoPodsMatched
    status: "False"
    type: RecommendationProvided
  recommendation: {}
voelzmo commented 1 year ago

Hey @ajaykumarmandapati, thanks for the detailed description!

I think what you're seeing here is the expected behavior from VPA

The reason is that VPA uses the scale subresource to find all Pods for a VPA object by looking at .status.selector (see this issue for an example what this looks like).

I'm not familiar with the VMAgent and how increasing the number of replicas translates to changes in the Deployment you're referencing, but I think the fix here would be to reference the owner of the Deployment – which is a VMAgent object in your case, as I understand correctly?

yogeek commented 1 year ago

Hello ! we have the same issue with EMQX operator https://github.com/emqx/emqx-operator It introduces a CRD called "EMQX" which has a scale subresource When you create an "kind: EMQX" resource, the operator then creates a statefulset, which also has a scale subresource and the result is the VPA description has the same error :

The targetRef controller has a parent but it should point to a topmost well-known or scalable controller

In our case, the VPA is created by goldilocks (https://github.com/FairwindsOps/goldilocks) : do we have to create an issue in goldilocks repository or is someone here is currently working on the current issue please ?

voelzmo commented 1 year ago

@yogeek: I'm only on a superficial level familiar with Goldilocks and how it creates the VPA objects. My understanding is that for annotated namespaces it automatically creates VPAs for all the Pods that are created/updated. However, I think it is only supposed to work with the built-in controllers managing Pods? I may be wrong here, so this is definitely something to ask the Goldilocks developers about.

I assume that the error message you're getting is because the targetRef of your VPA points to the StatefulSet, which is in turn owned by a different resource and managed by a different controller?

If that's the case, I guess the question for the Goldilocks developers would be if (and how) they support custom resources – VPA cannot do much here.

yogeek commented 1 year ago

Ok I will get in touch with goldilocks team then Thank you for your help @voelzmo 👍

sudermanjr commented 1 year ago

For posterity:

Goldilocks is supposed to be able to work with additional pod controllers, however this particular situation seems to be a case we're not handling properly. Please open a bug in Goldilocks and reference this so that we can look into it. Thanks!

voelzmo commented 1 year ago

Thanks for following up on this and keeping us in the loop, @sudermanjr!

ajaykumarmandapati commented 1 year ago

Hey @ajaykumarmandapati, thanks for the detailed description!

I think what you're seeing here is the expected behavior from VPA

  • VPA is looking for the scale subresource on your targetRef
  • But your target (the Deployment vmagent-application) is owned by another object, which also has a scale subresource

The reason is that VPA uses the scale subresource to find all Pods for a VPA object by looking at .status.selector (see this issue for an example what this looks like).

I'm not familiar with the VMAgent and how increasing the number of replicas translates to changes in the Deployment you're referencing, but I think the fix here would be to reference the owner of the Deployment – which is a VMAgent object in your case, as I understand correctly?

My sincere apologies, I missed this from my radar. However I was able to try your suggestion and unfortunately, it did not work for us and we get the following error.

Cannot read targetRef. Reason: Unhandled targetRef operator.victoriametrics.com/v1beta1 / VMAgent / application, last error Resource victoria/application has an empty selector for scale sub-resource

Please see the attached vpa-yaml below :

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: vmagent-application-vpa
spec:
  targetRef:
    apiVersion: "operator.victoriametrics.com/v1beta1"
    kind:       VMAgent
    name:       application
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: vmagent
        maxAllowed:
          cpu: 1

However upon inspecting the scale sub-resource of the CRD vmagents.operator.victoriametrics.com it does have a sub-resources part in it.

subresources:
      scale:
        labelSelectorPath: .status.selector
        specReplicasPath: .spec.shardCount
        statusReplicasPath: .status.shards
      status: {}
voelzmo commented 1 year ago

Seems like VictoriaMetrics accepted this as an enhancement on their side. Closing this issue on the VPA side, please re-open if there is something to do on our part. /cose /remove-kind bug /kind support

voelzmo commented 1 year ago

/close (╯°□°)╯ ︵ ┻━┻

k8s-ci-robot commented 1 year ago

@voelzmo: Closing this issue.

In response to [this](https://github.com/kubernetes/autoscaler/issues/5836#issuecomment-1667884678): >/close >(╯°□°)╯ ︵ ┻━┻ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.