jenkinsci / datadog-plugin

A Jenkins plugin used to forward metrics, events, and service checks to an account at Datadog, automatically.
https://plugins.jenkins.io/datadog/
MIT License
30 stars 48 forks source link

All classic jobs in jenkins turn into failed on datadog plugin v6.0.0 #393

Closed bitle closed 5 months ago

bitle commented 6 months ago

Describe the bug Today I upgraded all plugins in jenkins to the latest version. Among them is the most recent version of Datadog Plugin. After jenkins restarted all Successful builds of classic jobs show up as failed. I can still see in the logs that the result was SUCCESS. Here's what I found in the logs:

ConversionException: 
---- Debugging information ---- cause-exception : java.lang.NumberFormatException cause-message : For input string: "https://jenkins2.dev.lockhart.io/job/Infrastructure/job/fmc/job/cdfmc_rds_deploy/job/deploy_fmc_with_rds/2750/" class :
 java.lang.Long required-type : java.lang.Long converter-type : com.thoughtworks.xstream.converters.SingleValueConverterWrapper wrapped-converter : com.thoughtworks.xstream.converters.basic.LongConverter path : /build/actions/org.datadog.jenkins.plugins.datadog.traces.BuildSpanAction/buildData/buildUrl line number : 137 class[1] : org.datadog.jenkins.plugins.datadog.traces.message.TraceSpan$TraceSpanContext required-type[1] : org.datadog.jenkins.plugins.datadog.traces.message.TraceSpan$TraceSpanContext converter-type[1] : hudson.util.XStream2$AssociatedConverterImpl class[2] : org.datadog.jenkins.plugins.datadog.traces.BuildSpanAction required-type[2] : org.datadog.jenkins.plugins.datadog.traces.BuildSpanAction -------------------------------, CannotResolveClassException: buildParameters, CannotResolveClassException: charsetName, CannotResolveClassException: nodeName, CannotResolveClassException: jobName, CannotResolveClassException: baseJobName, CannotResolveClassException: buildTag, CannotResolveClassException: jenkinsUrl, CannotResolveClassException: executorNumber, CannotResolveClassException: javaHome, CannotResolveClassException: branch, CannotResolveClassException: gitUrl, CannotResolveClassException: gitCommit, CannotResolveClassException: isCompleted, CannotResolveClassException: hostname, CannotResolveClassException: userId, CannotResolveClassException: tags, CannotResolveClassException: startTime, CannotResolveClassException: endTime, ConversionException: Refusing to unmarshal duration for security reasons; see https://www.jenkins.io/redirect/class-filter/ ---- Debugging information ---- message : Refusing to unmarshal duration for security reasons; see https://www.jenkins.io/redirect/class-filter/ class : java.time.Duration required-type : java.time.Duration converter-type : hudson.util.XStream2$BlacklistedTypesConverter path : /build/actions/org.datadog.jenkins.plugins.datadog.traces.BuildSpanAction/buildData/duration line number : 197 -------------------------------, CannotResolveClassException: millisInQueue, CannotResolveClassException: buildSpanContext

I reverted back to the previous version and it fixed my issues.

To Reproduce I didn't try to reproduce this issue. I can provide my job configs and build history if needed.

Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots Screenshot 2024-02-03 at 9 30 33 AM

Environment and Versions (please complete the following information): Jenkins 2.426.3 Datadog 5.6.2 -> 6.0.0

Additional context Add any other context about the problem here.

jeohist commented 6 months ago

We experienced the same issue, but unfortunately reverting to the old version did not resolve our issues.

nikita-tkachenko-datadog commented 6 months ago

@bitle, @jeohist, thank you for reporting this. The issue was resolved in release v6.0.1. Please try updating and let me know if the issue persists. Thank you!

lemeurherve commented 5 months ago

@nikita-tkachenko-datadog we upgraded the ci.jenkins.io instance from 6.0.0 to 6.0.1 after the stackoverflow error we encountered this afternoon cf https://github.com/jenkinsci/datadog-plugin/issues/389#issuecomment-1932271114, but unfortunately all previous jobs are still marked as "failed" in 1970.

Ex: https://ci.jenkins.io/job/Infra/job/pipeline-library/job/master/ (previous builds have successfully finished while they appear failed)

image
nikita-tkachenko-datadog commented 5 months ago

Hi @lemeurherve,

Could you please provide some additional info?

Thank you!

lemeurherve commented 5 months ago

I'll provide you these elements first thing in the morning tomorrow.

nikita-tkachenko-datadog commented 5 months ago

As a side note, https://issues.jenkins.io/browse/JENKINS-66328 describes a similar issue. Some of the reports in there are from 25/01/202 (which is before Datadog plugin v6.0.0 was released) and the reporters claim that they're not using the Datadog plugin.

So while there is a plugin data deserialisation problem in v6.0.0, it is possible that the date/status display issue is caused by something else.

dduportal commented 5 months ago

or in the Manage Old Data screen? If yes, could you please share them?

First (quick) feedback on ci.jenkins.io (I'll let @lemeurherve provides more details with logs and/or build.xml excerpts) : after upgrading datadog from 6.0.0 to 6.0.1, we had the following warning in the "Manage Old Data" screen:

Capture d’écran 2024-02-07 à 16 03 22
nikita-tkachenko-datadog commented 5 months ago

Thanks for the details, @dduportal! I have managed to reproduce this in a local Jenkins instance.

The CannotResolveClassException checks-out: it indeed refers to a class that is no longer there in the new release of the plugin. "It is okay to leave unreadable data in these items/records, as Jenkins will simply ignore it" - that part was also true for me. While I saw the error in the Manage Old Data screen, the build in question had correct date and status, and looked normal.

The v6.0.0 version of the plugin had a different issue, where it could not deserialise one of the plugin's action classes because its format has changed. In some cases this led to the build data stored on disk being rewritten with default values (timestamp 0, status FAILED, etc).

To sum up: