jenkins-infra / helpdesk

Open your Infrastructure related issues here for the Jenkins project
https://github.com/jenkins-infra/helpdesk/issues/new/choose
16 stars 10 forks source link

[infra.ci.jenkins.io] Cron triggered build are not started since 2022-02-26 #2803

Closed dduportal closed 2 years ago

dduportal commented 2 years ago

Service

infra.ci.jenkins.io

Summary

Since the 24th of february 2022, we do not see any build starting with a cron trigger (example: jenkins-infra/kubernetes-manageemnt every 30 min, or Terraform jobs daily).

It seems correcelmated to the weekly 2.336 update, but cannot be sure as other element also changed the 23/24: it's a hunch but not a proof.

Reproduction steps

No response

MarkEWaite commented 2 years ago

I confirmed that cron jobs worked as expected for me in Jenkins 2.336, at least for a job that was configured to run every two minutes.

dduportal commented 2 years ago
Capture d’écran 2022-03-01 à 18 01 29

The error seems to come with the ternary operator. We are digging in this direction

lemeurherve commented 2 years ago

What we had before and worked:

(1)

pipeline {
  agent none

  options {
    buildDiscarder(logRotator(numToKeepStr: '10'))
    timeout(time: 30, unit: 'MINUTES')
    disableConcurrentBuilds()
  }

  triggers {
    cron (env.BRANCH_NAME == 'main' ? 'H/30 * * * *' : '')
  }

  stages {

What we've tried:

(2)

pipeline {
  agent none

  options {
    buildDiscarder(logRotator(numToKeepStr: '10'))
    timeout(time: 30, unit: 'MINUTES')
    disableConcurrentBuilds()
  }

  triggers {
    cron (env.BRANCH_IS_PRIMARY ? 'H/30 * * * *' : '')
  }

  stages {

(3)

String cronPeriod = env.BRANCH_IS_PRIMARY ? 'H/30 * * * *' : ''

pipeline {
  agent none

  options {
    buildDiscarder(logRotator(numToKeepStr: '10'))
    timeout(time: 30, unit: 'MINUTES')
    disableConcurrentBuilds()
  }

  triggers {
    cron (cronPeriod)
  }

  stages {

Notes: for this last try, an echo "cronPeriod: ${crondPeriod}" later in one of the first steps returns the expected value:

cronPeriod: H/30

cron ('H/30 * * * *') alone triggers cron jobs.

dduportal commented 2 years ago

Cc @MarkEWaite can you check on your own setup if you see the same behavior with a declarative pipeline using a ternary form?

jglick commented 2 years ago

Try

  triggers {
    cron("${BRANCH_NAME == 'main' ? 'H/30 * * * *' : ''}")
  }

perhaps.

If you find yourself trying tricks like this, just move to Scripted.

dduportal commented 2 years ago

If you find yourself trying tricks like this, just move to Scripted.

That sounds like a lot of pain for only 1 feature to be honest. I would use the crontab on all the branches instead.

Scripted is powerful and your tip makes sense, but honestly, my brain is not wired at all for "coding my pipeline", even after 7 years using scripted.

dduportal commented 2 years ago

try

triggers { cron("${BRANCH_NAME == 'main' ? 'H/30 ' : ''}") }

Thanks, we're going to try this one.

Weird thing: I wanted to check if this issue also happened with scripted, and it appears that yes, it is.

The pipeline https://github.com/jenkins-infra/aws/blob/main/Jenkinsfile_k8s utilizes the shared library https://github.com/jenkins-infra/pipeline-library/blob/master/vars/terraform.groovy#L33 which is full scripted. And there has been no builds on this one since the 24th, which is weird.

We are now trying the recomendation that @jglick and @jnord (thanks folks!) gave us to use scripted , on a simpler pipeline not involving shared library (to remove as much moving pieces as possible) + checking the Jenkins log carefully

dduportal commented 2 years ago

OK, so it does not sound related to any pipeline syntax: even the formal classic syntax cron ('H/30 * * * *') did not work since our "experiments" yesterday.

Currently capturing the logs to see what is happening.

dduportal commented 2 years ago

New checks:

 <triggers>
        <string>hudson.triggers.TimerTrigger</string>
  </triggers>

set up as expected, but the UI dos not show the option selected (in the "view configuration").

=> next step: gotta try to delete pod to force a full startup phase, and also a full "backup + rollback to version of 3 weeks ago"

dduportal commented 2 years ago
dduportal commented 2 years ago
dduportal commented 2 years ago
jglick commented 2 years ago

That config.xml does not look right. It should contain e.g.

  <properties>
    <org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty>
      <triggers>
        <hudson.triggers.TimerTrigger>
          <spec>H/30 * * * *</spec>
        </hudson.triggers.TimerTrigger>
      </triggers>
    </org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty>
  </properties>

That is what I get from running 2.319.3, installing pipeline-model-definition, and running a (standalone) Pipeline defined as

pipeline {
    agent none
    triggers {
        cron 'H/30 * * * *'
    }
    stages {
        stage('x') {
            steps {
                echo 'ok'
            }
        }
    }
}

Similarly when I create a multibranch Pipeline with a Git branch source, after waiting for branch indexing and the automatic initial build of the master branch project, though in that case there is also a org.jenkinsci.plugins.workflow.multibranch.BranchJobProperty entry as expected.

jglick commented 2 years ago

Oh and

diff --git Jenkinsfile Jenkinsfile
index e20b16b..d03e940 100644
--- Jenkinsfile
+++ Jenkinsfile
@@ -1,7 +1,7 @@
 pipeline {
     agent none
     triggers {
-        cron 'H/30 * * * *'
+        cron "${BRANCH_NAME == 'master' ? 'H/20 * * * *' : ''}"
     }
     stages {
         stage('x') {

did indeed work as expected for me after pushing to master and also creating a branch based on that. The master branch project is using the H/20 schedule, and the other branch project is using an empty schedule.

Also tried * * * * * on master and confirmed that builds are kicked off every minute on the master branch project but not on the other. “works on my machine”

jglick commented 2 years ago

Upgraded to 2.337 and updated Pipeline plugins accordingly. All still seems to be working.

Oh I think I see what you were confused by.

    <org.jenkinsci.plugins.pipeline.modeldefinition.actions.DeclarativeJobPropertyTrackerAction plugin="pipeline-model-definition@…">
      <jobProperties/>
      <triggers>
        <string>hudson.triggers.TimerTrigger</string>
      </triggers>
      <parameters/>
      <options/>
    </org.jenkinsci.plugins.pipeline.modeldefinition.actions.DeclarativeJobPropertyTrackerAction>

is correct. This is not the definition of the trigger, though; this is merely recording the fact that Declarative syntax did at some point specify a trigger, rather than it being via GUI configuration (which is of course impossible for a branch project anyway, but never mind that). The action trigger definition is in <properties> not <actions>.

dduportal commented 2 years ago

@jglick oh interesting thanks! We're going to dive in that direction.

As for now, the only success was "rollback to 2.335 & the plugins defined in https://github.com/jenkins-infra/docker-jenkins-weekly/releases/tag/0.42.3-2.335" + "Delete the whole multibranch job from UI & reload JCasc to recreate it".

dduportal commented 2 years ago
<org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty>
      <triggers/>
    </org.jenkinsci.plugins.workflow.job.properties.PipelineTriggersJobProperty>

which mean that the pipeline does NOT write the correct config (and the config <-> UI <-> behavior is coherent)

dduportal commented 2 years ago

OK, we were able to find a way to reproduce the behavior, at least on this job on this instance:

jglick commented 2 years ago

Offhand sounds like a bug for pipeline-model-definition-plugin (Declarative). Did you check behavior of the corresponding Scripted syntax using the properties step?

dduportal commented 2 years ago

Offhand sounds like a bug for pipeline-model-definition-plugin (Declarative). Did you check behavior of the corresponding Scripted syntax using the properties step?

Currently trying 2 scripted cases:

dduportal commented 2 years ago

I confirm that the same problem happens with pipeline in full scripted:

There is definitively something fishy

dduportal commented 2 years ago

I need help on this one from a Jenkins expert contributor.

At the same time, I'm trying to "bissect" what elements (core, plugins, combination" could help me pin when the issue happen.

Working on the following angles:

dduportal commented 2 years ago

Bug is reproduced with the 0.43.0-2.336 tag version (Core 2.336 with these plugins: https://github.com/jenkins-infra/docker-jenkins-weekly/blob/0.43.0-2.336/plugins.txt).

Trying with 2.335 and latest plugins:

dduportal commented 2 years ago

Pinning to 2.335: the bug is there. It means that it's either a plugin, or the setup.

dduportal commented 2 years ago
dduportal commented 2 years ago
timja commented 2 years ago

Good to know it wasn't dark theme 😂

lemeurherve commented 2 years ago

Good to know it wasn't dark theme 😂

"The cron was lost in the dark"

dduportal commented 2 years ago

OK, seems like that the culprit was the pipeline basic step plugin. Currently trying to confirm this.

jglick commented 2 years ago

You mean some update to workflow-basic-steps? Seems unlikely on the face of it, since this behavior of defining triggers is in workflow-job + pipeline-model-definition.

dduportal commented 2 years ago

@jglick thanks for the pointers! I was (again) too fast to make conclusions: I might have found something but it will wait for next week.

dduportal commented 2 years ago

Damn, the bug appears randomly whatever plugin combination I try. It's a mess, not sure how to handle this: we need help (we can delegate admin access to the instance, do whatever is needed).

Let's see after the weekend.

dduportal commented 2 years ago

Thanks @ lot @daniel-beck for triple-checking!

dduportal commented 2 years ago

Closing this issue as we were able to identify a short term fix + the PR https://github.com/jenkins-infra/pipeline-library/pull/315 was opened for long term.

Good news: it's not a bug to weekly core or any plugin!

Bad news: it's an UX issue for non Jenkins-experts :'(

Many many thanks for everyone who helped and spent time on this to unblock us.