jenkinsci / junit-sql-storage-plugin

https://plugins.jenkins.io/junit-sql-storage/
MIT License
6 stars 14 forks source link

Add caching, tracing, logging, and batching to the plugin #413

Closed craigtmoore closed 2 months ago

craigtmoore commented 3 months ago

Adds caching using Caffine cache, logging with java.util.logging, tracing using opentelemtry, and inserts the data into the database using batching. We have some builds on our Jenkins instance that have over 29,000 test results so we added these changes to optimize performance.

Fixes #412

Testing done

I re-used the existing tests to verify that I've introduced no new regressions, but also addeed a new unit-test to the getCaseResults_mockDatabase() method and verify that it loads the data correctly.

Submitter checklist

craigtmoore commented 3 months ago

As far as testing goes, the unit-tests at our company are quite large, over 29,000 test results in a single build. We're looking into using a plugin like this, but the performance was quite poor when first tried it (which is why I looked at improving the performance). At first I deployed it using mysql instead of postgres because I could not connect the database to Jenkins.

Once I was able to deploy it using and connect it to a dockerized mysql database, I ran the following pipeline:

pipeline {
    agent any
    stages {
        stage('Extract test results') {
            steps {
                sh 'unzip -o /tmp/archive.zip -d build'
            }
        }
    }
    post {
        always {
            junit 'build/**/TEST-*.xml'
        }
    }
}

Basically, I downloaded the zip file with the large number of test results from our production Jenkins server and then copied that to the Jenkins instance, using:

docker cp archive.zip junit-sql-storage-plugin-jenkins-1:/tmp/archive.zip

Then I ran the pipeline and it took a long time to store the test results (nearly 3 minutes to store the results using a mysql database). So I added open telemetry to the plugin so I could see where the bottlenecks were. I decided to add batching to the publish method to help reduce the time it takes to store the values. This certainly helped, but the other problem was loading the test results took a really long time. I thought it might have been that the queries were too slow, but it turned out that the junit plugin was calling the getAllPackageResults over 1500 times (which can takes 100s of milliseconds) so I added caching which reduced the processing time down to 10s of microseconds for each method call. I decided to take it one step further and cache the value returned by the retrieveCaseResults() method (which I renamed getCaseResults()), this also really helped because the case results query takes ~20ms to run and that takes a long time when there are over 29000 test results. Also, that query was being called by many of the meta-data methods, like:

So to answer your comment, yes I did a lot of testing to verify that my changes improved the performance. The telemetry that I added was also very useful in figuring out the bottle necks.

craigtmoore commented 3 months ago

I think I've resolved all of your comments, please let me know if there is any thing else. I really appreciate all of the feedback. I'm going to do a bit more testing, especially with junit attachments plugin.

timja commented 2 months ago

I don't think its StepHandler I think if the variables are propagated through createRemotePublisher and this code is copied then it would work: https://github.com/jenkinsci/opentelemetry-plugin/blob/82e3e1b2574a68864dcd05b43b95d0aeb7f41249/src/main/java/io/jenkins/plugins/opentelemetry/job/OtelEnvironmentContributorService.java#L41-L52

timja commented 2 months ago

Continued at https://github.com/jenkinsci/junit-sql-storage-plugin/pull/414

jonesbusy commented 2 months ago

Shoudn't the opentelemetry plugin be optional ? What about jenkins instance using SQL storage but not connected to an Opentelemetry collector ?

From what I remember few month ago is that having OpenTelemetry plugin installed will display some deak links on job/build page when the plugin is not configured

timja commented 2 months ago

Possibly, would be better to remove those dead links I think if the plugin isn't configured at all.

Plugins should be able to integrate without it adding a bunch of things to the UI

cyrille-leclerc commented 2 months ago

Thanks for your interest in the opentelemetry plugin. I think we could make it optional for simpler integration of otel in the other Jenkins plugins, I have a few checks to do. I'll get back to you ASAP

timja commented 2 months ago

just raised a PR https://github.com/jenkinsci/opentelemetry-plugin/pull/868

kuisathaverat commented 2 months ago

I couldn't see any TRACEPARENT env values being set

you have to enable the export of the environment variables that define the TRACEPARENT, but I not sure how this plugin get the Opentelemetry context, it probably has to retrieve it from the job not front and environment variable. We store the context in actions in the job.

timja commented 2 months ago

I was able to get it in https://github.com/jenkinsci/junit-sql-storage-plugin/pull/414

cyrille-leclerc commented 2 months ago

@craigtmoore @timja the way you integrate OpenTelemetry is great. GlobalOpenTelemetry.get() is the way to go. I'm looking at some improvements to this pattern GlobalOpenTelemetry.get(). I'll get back to you shortly.

I saw @timja' PR, Ill review it it asap, I have to understand the challenge you faced :-)

timja commented 2 months ago

Anything running on the controller I think would be quite straightforward, but running on agents is a bit more complicated, its not the nicest / cleanest implementation in https://github.com/jenkinsci/junit-sql-storage-plugin/pull/414