stackabletech / issues

This repository is only for issues that concern multiple repositories or don't fit into any specific repository
2 stars 0 forks source link

Reduce memory overprovisioning for JVM products #468

Open razvan opened 11 months ago

razvan commented 11 months ago

Description

Users should make better use of the memory available in their Kubernetes cluster.

Currently the Stackable operators do not permit a range for memory requests and limits. These values are always the same to gurantee that eventual pod re-schedules will not fail due to lack of memory.

But this policy leads to a lot of overprovisioning. This means that pods require more memory than they need "just in case" and sometimes products cannot even be installed at all as in this example:

grafik

A related problem is that Stackable operators compute a hard value for JVMs (-Xmx) that cannot be updated if the user could specify a range for memory request and limit. Currently around 80% of the container memory is allocated to the JVM heap. This also can lead to unused but commited memory.

Proposal

  1. Accepted (see comment below) Operators should allow a range of memory request and limit for container.
  2. Rejected: JVM heap should not be set to hard values but rather to a ratio of the effective memory by making use of the -XX:MaxRAMPercentage und -XX:MinRAMPercentage properties. For details, see this article.

Articles

These two articles suggest that Pods can be already evicted if the memory use is larger than the request (not the limit)

https://www.linkedin.com/pulse/kubernetes-requests-limits-pod-isaac-carrington/ https://kubernetes.io/blog/2021/11/26/qos-memory-resources/

razvan commented 11 months ago

We agreed to extend the MemoryLimits struct by adding a new optional field:

request: Optional<Quantity>

If not specified it defaults to limit.

In addition we make the existing field limit required.

This is a non breaking change since the new field is optional and even though the old field is now required, the entire struct is optional in the CRDs.

razvan commented 11 months ago
diff --git a/src/commons/resources.rs b/src/commons/resources.rs
index 3fea9c0..12b1659 100644
--- a/src/commons/resources.rs
+++ b/src/commons/resources.rs
@@ -164,7 +164,8 @@ pub struct MemoryLimits<T> {
     /// You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki.
     /// For example, the following represent roughly the same value:
     /// `128974848, 129e6, 129M,  128974848000m, 123Mi`
-    pub limit: Option<Quantity>,
+    pub limit: Quantity,
+    pub request: Option<Quantity>,
     /// Additional options that can be specified.
     #[fragment_attrs(serde(default))]
     pub runtime_limits: T,
@@ -305,10 +306,10 @@ impl<T, K> Into<ResourceRequirements> for Resources<T, K> {
     fn into(self) -> ResourceRequirements {
         let mut limits = BTreeMap::new();
         let mut requests = BTreeMap::new();
-        if let Some(memory_limit) = self.memory.limit {
-            limits.insert("memory".to_string(), memory_limit.clone());
-            requests.insert("memory".to_string(), memory_limit);
-        }
+
+        limits.insert("memory".to_string(), self.memory.limit.clone());
+        let request = self.memory.request.unwrap_or(self.memory.limit);
+        requests.insert("memory".to_string(), request);

         if let Some(cpu_max) = self.cpu.max {
             limits.insert("cpu".to_string(), cpu_max);
Techassi commented 11 months ago

Hint: You might want to take a look at that builder: https://github.com/stackabletech/operator-rs/blob/main/src/builder/pod/resources.rs

(Which maybe should be moved one module level up)