Optimise memory consumption, CPU usage and disk writes

Requirements for Contributing to this repository

Fill out the template below. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
The pull request must only fix one issue at the time.
The pull request must update the test suite to demonstrate the changed functionality.
After you create the pull request, all status checks must be pass before a maintainer reviews your contribution. For more details, please see CONTRIBUTING.

What does this PR do?

This PR optimises the plugin's resource consumption:

heap memory
CPU usage
disk I/O

The motivation is that there are several user complaints stating that the overhead of the plugin is too high.

The overhead is mostly caused by how Jenkins' durability mechanism interacts with the plugin. By default Jenkins saves to disk the state of the build at every step (e.g. after finishing a pipeline stage) or with every change (e.g. after adding or removing an Action to the build or one of the build's steps). This is needed to ensure durability: if the master node dies, it can be restarted, and the job will resume execution because its up-to-date state will be deserialised from disk.

The problem is that the state saved to disk includes all the data that the plugin associates with the build. This data is added to the build as it executes and is retained until the very end of the build execution. It includes steps/stages execution data, build metadata, etc. The reason it is retained until the end of the build is because the trace for the entire build is submitted once the build is finished.

The problem is aggravated by the fact that the data for individual stages and steps is stored in the build object, so with every step and every change all of that data has to be saved. This results in writing and re-writing to disk multiple times the same data, even the data for stages that have already completed.

This is how it contributes to the overhead:

heap memory - for pipelines that take a lot of time to finish, all of their data sits in the heap until the very end of the build, increasing pressure on the GC (made worse by the fact that the data from long-living pipelines is likely to survive multiple garbage collections and end up being promoted to old generation).
CPU usage - the data has to be serialised. By default Jenkins uses reflection-based serialisers that have to examine class metadata in order to determine which fields to serialise and how. This is very CPU-intensive
disk I/O - as explained above, the data for the entire build is written and re-written to disk many times. Besides, the data contains a lot of duplication, e.g. every step includes its own copy of the pipeline's environment variables.

The points above have been corroborated with several JFR profiles obtained from different customers.

To address the issue, the following is changed:

As soon as a stage or a step in a pipeline finishes, its span is serialised and added to the next batch that will be submitted. Once this is done, all of the data associated with that stage/step is removed.
Stage/step-specific data is saved in corresponding build nodes, rather than in the build object. As the result, it is serialised to disk only when there are changes related to those specific build nodes. Unrelated changes to the build or other build nodes will not cause that data to be serialised.
Custom converters implemented for all data that is serialised. They are more performant than the standard reflection-based converters since there is no need to examine class metadata to determine what needs to be serialised / deserialised.

Since now the plugin retains steps/stages data for the shortest possible period, the way data is propagated from steps to stages to builds (this includes, for example, execution node or git metadata) has been reworked as well.

Description of the Change

Alternate Designs

Possible Drawbacks

Verification Process

Since the changes in the behaviour were minimal, existing tests were used to verify there are no regressions. Some of the tests had to be adjusted because of the way Git metadata is gathered now: the tests have to do actual SCM checkout to better emulate real-life pipelines.

In addition, manual tests were executed in a dockerized Jenkins instance, covering the following:

Freestyle build trace submitted via webhooks Freestyle build trace submitted via EVP proxy Freestyle build trace submitted via APM track Pipeline trace submitted via webhooks Pipeline trace submitted via EVP proxy Pipeline trace submitted via APM track

Additional Notes

Release Notes

Review checklist (to be filled by reviewers)

[ ] Feature or bug fix MUST have appropriate tests (unit, integration, etc...)
[ ] PR title must be written as a CHANGELOG entry (see why)
[ ] Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
[ ] PR must have one changelog/ label attached. If applicable it should have the backward-incompatible label attached.
[ ] PR should not have do-not-merge/ label attached.
[ ] If Applicable, issue must have kind/ and severity/ labels attached at least.

jenkinsci / datadog-plugin