jenkinsci / datadog-plugin

A Jenkins plugin used to forward metrics, events, and service checks to an account at Datadog, automatically.
https://plugins.jenkins.io/datadog/
MIT License
30 stars 48 forks source link

Optimise memory consumption, CPU usage and disk writes #381

Closed nikita-tkachenko-datadog closed 7 months ago

nikita-tkachenko-datadog commented 7 months ago

Requirements for Contributing to this repository

What does this PR do?

This PR optimises the plugin's resource consumption:

The motivation is that there are several user complaints stating that the overhead of the plugin is too high.

The overhead is mostly caused by how Jenkins' durability mechanism interacts with the plugin. By default Jenkins saves to disk the state of the build at every step (e.g. after finishing a pipeline stage) or with every change (e.g. after adding or removing an Action to the build or one of the build's steps). This is needed to ensure durability: if the master node dies, it can be restarted, and the job will resume execution because its up-to-date state will be deserialised from disk.

The problem is that the state saved to disk includes all the data that the plugin associates with the build. This data is added to the build as it executes and is retained until the very end of the build execution. It includes steps/stages execution data, build metadata, etc. The reason it is retained until the end of the build is because the trace for the entire build is submitted once the build is finished.

The problem is aggravated by the fact that the data for individual stages and steps is stored in the build object, so with every step and every change all of that data has to be saved. This results in writing and re-writing to disk multiple times the same data, even the data for stages that have already completed.

This is how it contributes to the overhead:

The points above have been corroborated with several JFR profiles obtained from different customers.

To address the issue, the following is changed:

Since now the plugin retains steps/stages data for the shortest possible period, the way data is propagated from steps to stages to builds (this includes, for example, execution node or git metadata) has been reworked as well.

Description of the Change

Alternate Designs

Possible Drawbacks

Verification Process

Since the changes in the behaviour were minimal, existing tests were used to verify there are no regressions. Some of the tests had to be adjusted because of the way Git metadata is gathered now: the tests have to do actual SCM checkout to better emulate real-life pipelines.

In addition, manual tests were executed in a dockerized Jenkins instance, covering the following:

Freestyle build trace submitted via webhooks Freestyle build trace submitted via EVP proxy Freestyle build trace submitted via APM track Pipeline trace submitted via webhooks Pipeline trace submitted via EVP proxy Pipeline trace submitted via APM track

Additional Notes

Release Notes

Review checklist (to be filled by reviewers)