kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.09k stars 3.97k forks source link

Support memory request recommendations for jvm based workloads #5029

Open vsevel opened 2 years ago

vsevel commented 2 years ago

Which component are you using?:

vertical-pod-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

the GKE documentation indicates: "Vertical Pod autoscaling is not ready for use with JVM-based workloads due to limited visibility into actual memory usage of the workload."

What are those limitations? what visibility do you need exactly? And what use cases does it forbid? It would be nice to have a better understanding of what is supported today (1st step), and (2nd step) develop the missing pieces to have an effective support for JVM based workloads, on-par with other technologies.

Starting in JDK 12 (and backported in JDK 11), JEP 346: Promptly Return Unused Committed Memory from G1 allows the GC to give back to the OS memory pages that are not necessary anymore. Modern GC all provide a similar feature: https://www.baeldung.com/gc-release-memory

This opens up to memory being used by the process (heap and other) to be more accurate to the real needs of the application.

With this in place, my understanding is that the vpa autoscaler is already accurate to calculate memory requests by watching the consumed memory overtime. Can you confirm? If the calculated recommendation is not accurate, please describe the type of error it is going to introduce: is vpa going to overestimate or underestimate memory requests? If it overestimates memory, this might still be a better solution than to set request=limit (guaranteed).

In my company we have started doing an evaluation of using VPA for memory requests on JVM based workloads (we don't intend to use VPA for memory limits). On 100 pods, the average configured memory requests is1600Mb (max = 6Gb), and the average vpa recommended memory requests is 1240Mb (max = 4.5Gb). The average saving is 425Mb.

There are also 20 pods with "memory capped to container limit", which suggests that the memory request recommended by VPA is greater than the limit, which is puzzling.

Describe the solution you'd like.:

Describe any alternative solutions you've considered.:

None. The other alternative is to do manual tuning and setup of requests in the pod spec.

Additional context.:

see this kubecon talk

cc @mwielgus @matthyx

jbartosik commented 2 years ago

I didn't look into that but from what I recall with talking to others VPA recommendations might explode for JVM workloads.

It will run garbage collection when it thinks it's close to running out of memory. So if you're looking at memory usage you'll see that:

VPA looks at daily peak of memory usage (max memory usage in 24h window for each container). So if the memory usage cycle takes less than 24h to VPA this will look like the container is using ${GC threshold}.

${GC threshold} should depend on memory request (otherwise increasing memory request doesn't make sense). So if VPA grows memory the threshold will also increase. But since memory usage (as observed by VPA) depends on the threshold memory usage (observed by VPA) will also grow, so recommendation will grow and so on until VPA starts capping recommendation because it wouldn't fit in any node in the cluster.

jbartosik commented 2 years ago

In my company we have started doing an evaluation of using VPA for memory requests on JVM based workloads (we don't intend to use VPA for memory limits). On 100 pods, the average configured memory requests is1600Mb (max = 6Gb), and the average vpa recommended memory requests is 1240Mb (max = 4.5Gb). The average saving is 425Mb.

This sound like memory usage is not growing anywhere close to the request (which conflict with an assumption in my previous comment). Can you share why?

vsevel commented 2 years ago

just to be sure, ${GC threshold} is the value at which the gc is going to kick in to do a full gc? actually, the gc is a bit more complex, and it depends on which gc impl we are talking about. there could be different behaviors depending on using g1 vs zgc for instance. but usually there are zones (e.g. new and old). when created the objects stay in the new zone for a while. if they resist several gc cycles, they will eventually land in the old zones where another algorithm takes place. 99% of objects get gced in the new zone. that is where you have the real app activity. I have a memory intensive app (lucene indexer server), which has a 0 activity (0 ms per minute, 0 cycles per minute) on the G1 Old Generation, and in average around 400 ms per minute (min 200ms, max 1100 ms) of activity for the G1 Young Generation. so the heap memory profile (old+new) for this app is fairly stable. it does increase slowly like you are saying, but this will probably take days before there is a full gc. now it is my responsibility to set a Xmx, which is big enough to accommodate my use cases, and tight enough that I will not waste committed heap memory that the jvm does not feel like returning. if this is the case. g1 has made a lot of effort to not waste memory, like the JEP 346 I was referring to. I need to watch this. but regardless, the heap is not the only memory space to account for in a jvm. there is all the memory consumed by the process that is not part of the heap. the extra memory consumed by the jvm can be 50 to 100% of the heap. there are jvm flags for this: MaxRAMFraction=2 means 1/2 of max memory in the container will be reserved for your heap. so if your container has a limit of 1Gb, your heap (Xmx) will be 512Mb. The point is that it might be difficult for VPA to assess correctly the heap, but recommendations are still useful for everything else. and that is not necessarily tiny.

This sound like memory usage is not growing anywhere close to the request (which conflict with an assumption in my previous comment). Can you share why?

with that in mind, I can imagine a few things that happen there, and appear to be working:

worst case scenario I am thinking that even if VPA is not assessing correctly the heap (because it contains objects that could be gced), the value that VPA captures (so total memory of the process) is a value that may be oversized. but can't be undersized (otherwise the app would have crashed). so it is somewhere between correct and oversized. and it will be very much oversized if the Xmx is way too big for the app (too much comfort) and the gc is lazy in the new zone (objects stay a long time in the heap before being gced).

so I am thinking VPA should be relevant even for memory requests on jvm workloads assuming:

there are situations however, I do not understand how VPA could calculate a value that big. for instance I have this deployment:

      containers:
        - resources:
            limits:
              cpu: '2'
              ephemeral-storage: 1Gi
              memory: 728Mi
            requests:
              cpu: 100m
              ephemeral-storage: 1Gi
              memory: 728Mi
...
            - name: JAVA_OPTS_APPEND
              value: '-Xms128m -Xmx256m'

and VPA calculated:

  recommendation:
    containerRecommendations:
      - containerName: container-efxservices
        lowerBound:
          cpu: 25m
          memory: '857515809'
        target:
          cpu: 93m
          memory: '1102117711' ==> 1051 Mb??
        uncappedTarget:
          cpu: 93m
          memory: '1102117711' ==> 1051 Mb??
        upperBound:
          cpu: 743m
          memory: '6500245275' ==> 6199 Mb???

how could it recommend a value greater than the limit? this can't be something it saw was consumed... on this app I see the heap old+new going regularly between 120Mb and 185Mb every 1 or 2 minutes, and the gc activity is low (time spent and number of cycles, and the container is stable around 750 Mb (it is not the g1 in that case, so unused memory is not provided back to the os). the app has gone through 2 restarts (not OOM killed I believe) in the last few hours. so there might be a reason for the VPA calculation? regardless, I do not understand how you would come up with values greater than the limit.

the bottom line is that even if heap behavior is a challenge in a jvm workload, worst case you might tend to overestimate the heap needed and you should be relevant for the non heap memory. do you think that makes sense?

note: we are not using VPA for memory limits, only requests

vsevel commented 2 years ago

have you been able to give it some thoughts? cc @matthyx

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

lsmith77 commented 1 year ago

/remove-lifecycle rotten

brunoborges commented 1 year ago

even if heap behavior is a challenge in a jvm workload, worst case you might tend to overestimate the heap needed and you should be relevant for the non heap memory. do you think that makes sense?

One better way to define heap is using -XX:MaxRAMPercentage. By default, and with over 512MB of RAM in the container, this flag is set at 25 (25%). You can customize this flag, and set to, say, 75%. Once the container gets more memory thanks to VPA, the JVM will restart with a bigger heap size.

So, if you are going to use VPA, make sure you don't use -Xmx, otherwise you are not going to take advantage of the extra memory.

As for non-heap memory, there is more than just GC data structures going to use it. JIT compilers in the JVM will also make use of the non-heap memory. Depending on libraries or products/projects running inside the JVM, these may also use off-heap memory buffers, thus consuming extra. Different GCs will also have different native memory needs. The rule of thumb for a 75% heap works well for almost all cases with 1GB and up of container memory.

des1redState commented 1 year ago

This is something I'm interested in. However, JVM recommendations aren't currently possible via the Recommender, at least for memory, because the JVM reserves memory on startup and allocates its heaps/pools internally, meaning metrics from the container itself are useless.

What we need is the ability to define a custom Prometheus query for memory usage, for example jvm_memory_used_bytes assuming you're exporting JVM metrics to Prom' using micrometer or something.

In addition (but out of the scope of this project) the following JVM parameters are usually recommended for containerized Java, though may differ slightly based on the size of your application: -XX:+UseParallelGC -XX:MaxRAMPercentage=75 -XX:ActiveProcessorCount=2

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

lucasfcnunes commented 9 months ago

/remove-lifecycle stale

satyamsundaram commented 6 months ago

What is the progress on this? Is VPA still not ready for JVM-based apps? Because although this limitation is present in GKE's VPA docs, it's not in Kubernetes docs. What do I infer from this difference?

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

des1redState commented 3 months ago

What is the progress on this? Is VPA still not ready for JVM-based apps? Because although this limitation is present in GKE's VPA docs, it's not in Kubernetes docs. What do I infer from this difference?

There's no ability to define a custom metrics query in the Recommender at the moment, and the maintainers who've spoken up so far don't seem too interested, so no, we can't get recommendations based on JVM HEAP usage, etc.

Personally, I've given up on the VPA for JVM-based containers and am using KEDA to horizontally scale based on JVM metrics via Prometheus.

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

florianmutter commented 2 months ago

/remove-lifecycle rotten