Add caching, tracing, logging, and batching to the plugin

craigtmoore commented 3 months ago

Adds caching using Caffine cache, logging with java.util.logging, tracing using opentelemtry, and inserts the data into the database using batching. We have some builds on our Jenkins instance that have over 29,000 test results so we added these changes to optimize performance.

Fixes #412

Update DatabaseTestResultStorage:
- add caseResultsCache
- add packageResultsCache
- update publish() method to use batch updates to improve performance
- Add @WithSpan annotation to each method
- add logger
Update DatabaseTestResultStorageTest:
- add basic unit-test with a mocked database
- move duplicate code segments to constants and methods
- print tables with proper column widths (and truncate values that are too large)
Update pom.xml:
- add opentelemtry dependencies (for tracing)
- add caffine dependency (for caching)
- add dependency on kotlin-std-lib-jkd8 to fix dependency resolution issues
- tie the hpi goal to the compile phase
update docker-compose.yaml:
- add commented out 'mysql' database config
- add jaeger server
- add jenkins server with otel agent embedded
docker-compose-with-zipkin.yaml: same as docker-compose.yaml, but using zipkin instead of jaeger
Add Dockerfile: creates an instance of the Jenkins docker image that includes the opentelemetry agent jar file, used to generate theweatherman/jenkins:lts-jdk17-otel image.
Add setup_jenkins.sh: bash script for installing the plugin and configuring jenkins to store junit results in the databse (works with the docker-compose.yaml)

Testing done

I re-used the existing tests to verify that I've introduced no new regressions, but also addeed a new unit-test to the getCaseResults_mockDatabase() method and verify that it loads the data correctly.

Submitter checklist

[x] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
[x] Ensure that the pull request title represents the desired changelog entry
[x] Please describe what you did
[x] Link to relevant issues in GitHub or Jira
[NA] Link to relevant pull requests, esp. upstream and downstream changes
[x] Ensure you have provided tests - that demonstrates feature works or fixes the issue

craigtmoore commented 3 months ago

As far as testing goes, the unit-tests at our company are quite large, over 29,000 test results in a single build. We're looking into using a plugin like this, but the performance was quite poor when first tried it (which is why I looked at improving the performance). At first I deployed it using mysql instead of postgres because I could not connect the database to Jenkins.

Once I was able to deploy it using and connect it to a dockerized mysql database, I ran the following pipeline:

pipeline {
    agent any
    stages {
        stage('Extract test results') {
            steps {
                sh 'unzip -o /tmp/archive.zip -d build'
            }
        }
    }
    post {
        always {
            junit 'build/**/TEST-*.xml'
        }
    }
}

Basically, I downloaded the zip file with the large number of test results from our production Jenkins server and then copied that to the Jenkins instance, using:

docker cp archive.zip junit-sql-storage-plugin-jenkins-1:/tmp/archive.zip

Then I ran the pipeline and it took a long time to store the test results (nearly 3 minutes to store the results using a mysql database). So I added open telemetry to the plugin so I could see where the bottlenecks were. I decided to add batching to the publish method to help reduce the time it takes to store the values. This certainly helped, but the other problem was loading the test results took a really long time. I thought it might have been that the queries were too slow, but it turned out that the junit plugin was calling the getAllPackageResults over 1500 times (which can takes 100s of milliseconds) so I added caching which reduced the processing time down to 10s of microseconds for each method call. I decided to take it one step further and cache the value returned by the retrieveCaseResults() method (which I renamed getCaseResults()), this also really helped because the case results query takes ~20ms to run and that takes a long time when there are over 29000 test results. Also, that query was being called by many of the meta-data methods, like:

getFailedTests
getSkippedTests
getPassedTests By caching the caseResults, we avoided running the query repeatedly and reduced the processing time to 10s of microseconds.

So to answer your comment, yes I did a lot of testing to verify that my changes improved the performance. The telemetry that I added was also very useful in figuring out the bottle necks.

craigtmoore commented 3 months ago

I think I've resolved all of your comments, please let me know if there is any thing else. I really appreciate all of the feedback. I'm going to do a bit more testing, especially with junit attachments plugin.