openshift / origin-aggregated-logging

139 stars 231 forks source link

BZ-1897642: Decrease number of gc.log files managed by -Xlog option #2126

Closed lukas-vlcek closed 3 years ago

lukas-vlcek commented 3 years ago

Description

As of writing we are using -Xlog:gc option to tell JVM to produce and manage logs containing detailed info about JVM GCs. https://github.com/openshift/origin-aggregated-logging/blob/c4a31147e4af9883d2ad749f4060a3c70e641816/elasticsearch/run.sh#L95

There is a question if we really need these files and if it is ok that they can take space from PVC which is primarily used by index data.

First, the gc.log files (and it's rotations) are not manager by log4j2 but directly by JVM. See JVM Unified Logging Framework https://openjdk.java.net/jeps/158

Second, according to some resources GC logging (-Xlog) has very low performance impact. See https://dzone.com/articles/enabling-and-analysing-the-garbage-collection-log

Third, in our case the total space taken by gc.log files will actually grow only up to 2GB before the logs start rotating. This is not a large amount of data given that some other log files (this time managed by log4j2) can grow larger than that (for example the ES deprecation log).

If there is anything questionable then it is not the existing size of gc.log files but rather the usefulness of them. When Elasticsearch JVM is running heavy and expensive GC cycles then it will be actually logged into ES log files itself anyway (see https://discuss.elastic.co/t/change-gc-log-thresholds/155132 for relevant discussion and how it can be tuned). I assume that the JVM managed gc logs are mostly useful to Elasticsearch developers because they can help uncover specific memory leaks or troubleshoot other memory issues. But IMO they add little value when supporting specific customer cases (again, if there are heavy GCs running we will still see them in regular es.log files).

I am considering to either turn off -Xlog:gc config at all (due to reasons explained above) or decrease the number of the gc.log files to half. Down to 16 from 32. Saving about 1GB of disk space.

Right now I incline to do the later (keep only 16 recent gc logs).

/cc @jcantrill /assign @ewolinetz

Note: I do not think we actually need to consider back-porting this to earlier releases.

Links

lukas-vlcek commented 3 years ago

/retest

lukas-vlcek commented 3 years ago

/test cluster-logging-operator-e2e

lukas-vlcek commented 3 years ago

/test cluster-logging-operator-e2e

lukas-vlcek commented 3 years ago

/test cluster-logging-operator-e2e

ewolinetz commented 3 years ago

@lukas-vlcek do we even need to keep 16? Can we make this even lower? What do we gain from keeping so many historical logs vs just the most recent 8 or even 4?

lukas-vlcek commented 3 years ago

/retest

lukas-vlcek commented 3 years ago

/test cluster-logging-operator-e2e

lukas-vlcek commented 3 years ago

/test elastic-operator-e2e

lukas-vlcek commented 3 years ago

/test cluster-logging-operator-e2e

jcantrill commented 3 years ago

/lgtm

openshift-ci[bot] commented 3 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, lukas-vlcek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/origin-aggregated-logging/blob/master/OWNERS)~~ [jcantrill] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment