jetstack / navigator

Managed Database-as-a-Service (DBaaS) on Kubernetes
Apache License 2.0
271 stars 31 forks source link

Use cgroup for jvm memory limit #337

Closed kragniz closed 6 years ago

kragniz commented 6 years ago

This should cap memory limits to the amount allocated in the containers cgroup.

Use cgroup for jvm memory limit
kragniz commented 6 years ago

Note: not currently tested with elasticsearch

munnerz commented 6 years ago

Ran this https://gist.github.com/munnerz/bb19540ab547fc5d2ec1e1af0f7a2963

docker.elastic.co/elasticsearch/elasticsearch:6.2.3 = openjdk version "1.8.0_161"
docker.elastic.co/elasticsearch/elasticsearch:6.2.2 = openjdk version "1.8.0_161"
docker.elastic.co/elasticsearch/elasticsearch:6.2.1 = openjdk version "1.8.0_161"
docker.elastic.co/elasticsearch/elasticsearch:6.2.0 = openjdk version "1.8.0_161"
docker.elastic.co/elasticsearch/elasticsearch:6.1.4 = openjdk version "1.8.0_161"
docker.elastic.co/elasticsearch/elasticsearch:6.1.3 = openjdk version "1.8.0_161"
docker.elastic.co/elasticsearch/elasticsearch:6.1.2 = openjdk version "1.8.0_151"
docker.elastic.co/elasticsearch/elasticsearch:6.1.1 = openjdk version "1.8.0_151"
docker.elastic.co/elasticsearch/elasticsearch:6.1.0 = openjdk version "1.8.0_151"
docker.elastic.co/elasticsearch/elasticsearch:6.0.1 = openjdk version "1.8.0_151"
docker.elastic.co/elasticsearch/elasticsearch:6.0.0 = openjdk version "1.8.0_151"
docker.elastic.co/elasticsearch/elasticsearch:5.6.8 = openjdk version "1.8.0_161"
docker.elastic.co/elasticsearch/elasticsearch:5.6.7 = openjdk version "1.8.0_161"
docker.elastic.co/elasticsearch/elasticsearch:5.6.6 = openjdk version "1.8.0_151"
docker.elastic.co/elasticsearch/elasticsearch:5.6.5 = openjdk version "1.8.0_151"
docker.elastic.co/elasticsearch/elasticsearch:5.6.4 = openjdk version "1.8.0_141"
docker.elastic.co/elasticsearch/elasticsearch:5.6.3 = openjdk version "1.8.0_141"
docker.elastic.co/elasticsearch/elasticsearch:5.6.2 = openjdk version "1.8.0_141"
docker.elastic.co/elasticsearch/elasticsearch:5.6.1 = openjdk version "1.8.0_141"
docker.elastic.co/elasticsearch/elasticsearch:5.6.0 = openjdk version "1.8.0_141"
docker.elastic.co/elasticsearch/elasticsearch:5.5.3 = openjdk version "1.8.0_141"
docker.elastic.co/elasticsearch/elasticsearch:5.5.2 = openjdk version "1.8.0_141"
docker.elastic.co/elasticsearch/elasticsearch:5.5.1 = openjdk version "1.8.0_141"
docker.elastic.co/elasticsearch/elasticsearch:5.5.0 = openjdk version "1.8.0_131"
docker.elastic.co/elasticsearch/elasticsearch:5.4.3 = openjdk version "1.8.0_131"
docker.elastic.co/elasticsearch/elasticsearch:5.4.2 = openjdk version "1.8.0_131"
docker.elastic.co/elasticsearch/elasticsearch:5.4.1 = openjdk version "1.8.0_131"
docker.elastic.co/elasticsearch/elasticsearch:5.4.0 = openjdk version "1.8.0_131"
docker.elastic.co/elasticsearch/elasticsearch:5.3.3 = openjdk version "1.8.0_131"
docker.elastic.co/elasticsearch/elasticsearch:5.3.2 = openjdk version "1.8.0_121"
docker.elastic.co/elasticsearch/elasticsearch:5.3.1 = openjdk version "1.8.0_121"
docker.elastic.co/elasticsearch/elasticsearch:5.3.0 = openjdk version "1.8.0_92-internal"
docker.elastic.co/elasticsearch/elasticsearch:5.2.1 = openjdk version "1.8.0_92-internal"
docker.elastic.co/elasticsearch/elasticsearch:5.2.0 = openjdk version "1.8.0_92-internal"

EDIT: updated with more versions

kragniz commented 6 years ago

cool, so 5.2.1 is the only one that won't support the cgroup flags

munnerz commented 6 years ago

I've updated my comment with some more versions

munnerz commented 6 years ago

@cehoffman do you have a requirement for any specific elasticsearch versions? It looks like this fix would not work for:

docker.elastic.co/elasticsearch/elasticsearch:5.3.2 = openjdk version "1.8.0_121"
docker.elastic.co/elasticsearch/elasticsearch:5.3.1 = openjdk version "1.8.0_121"
docker.elastic.co/elasticsearch/elasticsearch:5.3.0 = openjdk version "1.8.0_92-internal"
docker.elastic.co/elasticsearch/elasticsearch:5.2.1 = openjdk version "1.8.0_92-internal"
docker.elastic.co/elasticsearch/elasticsearch:5.2.0 = openjdk version "1.8.0_92-internal"
cehoffman commented 6 years ago

Nope, currently on 6.2.3

On Apr 18, 2018, 12:07 -0500, James Munnelly notifications@github.com, wrote:

@cehoffman do you have a requirement for any specific elasticsearch versions? It looks like this fix would not work for: docker.elastic.co/elasticsearch/elasticsearch:5.3.2 = openjdk version "1.8.0_121" docker.elastic.co/elasticsearch/elasticsearch:5.3.1 = openjdk version "1.8.0_121" docker.elastic.co/elasticsearch/elasticsearch:5.3.0 = openjdk version "1.8.0_92-internal" docker.elastic.co/elasticsearch/elasticsearch:5.2.1 = openjdk version "1.8.0_92-internal" docker.elastic.co/elasticsearch/elasticsearch:5.2.0 = openjdk version "1.8.0_92-internal" — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

munnerz commented 6 years ago

/lgtm /approve

jetstack-bot commented 6 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: munnerz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/jetstack/navigator/blob/master/OWNERS)~~ [munnerz] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
retest-bot commented 6 years ago

/retest This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to jetstack). Review the full test history for this PR. Silence the bot with an /lgtm cancel comment for consistent failures.

cehoffman commented 6 years ago

Haven't been able to update navigator to a version which includes this change, but I updated a docker image for elasticsearch to include these flags. I found a couple problems. For elasticsearch, these flags are ignored because the default options for the JVM have min/max heap set. From the official blog introducing these flags When these two JVM command line options are used, and -Xmx is not specified, the JVM will look at the Linux cgroup configuration.

I created a new image then removed the default heap flags. This resulted in constant OOM because of the memory required by pilot is my guess. I set the InitialRAMFraction to 2 and have had success now. This gives the JVM a good initial heap to help fragmentation, but it still will eventually OOM due to shared memory space with pilot. I think this is an argument for pilot being another container in the deployment instead of being copied to database container.

InitialRAMFraction did not work how I expected, in a 6Gi limit container the JVM allocated a max heap of only 1.5Gi and never increased it. I reverted back to using Xms and Xmx flags until the OOM due to pilot memory can be addressed.

Edit: Also on the subject of flags. Elasticsearch needs the -Des.cgroups.hierarchy.override=/ flag to get proper monitoring results.