timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-34281] Queue isn't saved on any shutdown #3044

Closed timja closed 8 years ago

timja commented 8 years ago

When Jenkins is stopped, the queue file is created except it's empty. No information about the running jobs or jobs pending in the queue exist.

It essentially saves a file like:


42

This behavior is a regression from Jenkins 2.0 beta2.


Originally reported by kwhetstone, imported from: Queue isn't saved on any shutdown
  • assignee: svanoort
  • status: Resolved
  • priority: Critical
  • resolution: Fixed
  • resolved: 2016-04-23T15:17:57+00:00
  • imported: 2022/01/10
timja commented 8 years ago

svanoort:

This was my comment from https://issues.jenkins-ci.org/browse/JENKINS-33926 when I discovered this in testfest yesterday:

100% reproducible with the 2.0-rc WAR running on Mac.

Test case:

node {
echo 'stuffs'
sleep 100
stage 'Stage 2'
}

Result:

timja commented 8 years ago

svanoort:

I'd tested on CJE 1.625.16.1 - the queue is maintained when killed by Ctrl+C. Just rechecked, and that is also true on 1.642.x - so it's clearly a regression since then.

timja commented 8 years ago

kwhetstone:

Also further note on this: if the queue is full of freestyle jobs, then they will not be persisted or restarted at all.

timja commented 8 years ago

kwhetstone:

After some searching, found that the queue is persisted correctly on upgrades from 1.x, but not on a new install of 2.0. The only reason a job wouldn't be saved is if it was defined as a TransientTask. There might have been some change to jobs listing them all as TransientTasks or somehow everything is being listed as a TransientTask.

timja commented 8 years ago

svanoort:

So, I'm looking at how/when the save occurs. I found that the if I invoke the queue save method from the script console ( 'Jenkins.instance.queue.save()'), it correctly persists enqueued jobs:

The only way this can generate the result we're seeing appears to be if no tasks are listed by this, or the queue instance being used has been cleared by another method call, or the save() method on the queue isn't invoked at the appropriate time (and a previous write overwrote it):

for (Item item: getItems()) {
if(item.task instanceof TransientTask)  continue;
    state.items.add(item);
}

Adding log statements to save confirmed that it is the latter, adding this on Queue.java, line

System.out.println("Saving queue with "+state.items.size()+" items!"

Yielded the following on shutdown via Ctrl+C:
Saving queue with 0 items!

Edit: checking into it, it's not that they're Transient items, upon save, getItems().length returns 0, so does this.waitingList.size() - the queue has been cleared before saving.

timja commented 8 years ago

svanoort:

Root cause is a permissions check failure in the getItems() method upon shutdown. The Items don't get added to the getItems() results if they fail permissions checks (verified by logging that it does, and disabling security check there fixes the queue persistance).

timja commented 8 years ago

svanoort:

when running with what behaves as an upgrade via mvn -f war hudson-dev:run I get a different behavior even though it plays as if it is an upgrade: the final freestyle project in the queue isn't started from on restart queue, but not lost either - it gets marked as aborted (bizarre, no?)

timja commented 8 years ago

svanoort:

If I log the user Authentication when doing the logging upon shutdown:

Principal: org.acegisecurity.providers.anonymous.AnonymousAuthenticationToken@ffffffc4: Username: anonymous; Password: [PROTECTED]; Authenticated: true; Details: null; Granted Authorities: anonymous
Authentication name: anonymous

Unfortunately by default the setup wizard disables anonymous read access: setAllowAnonymousRead(false)
As of https://github.com/jenkinsci/jenkins/pull/2042/files#diff-f65b8a70854ca1cc6c12397eee54d279R62

timja commented 8 years ago

svanoort:

So, to summarize:

Too long; Didn't read summary: Jenkins wasn't running some forms of shutdown as the system user, and when we removed anonymous read access as part of the secure-out-of-the-box PR (https://github.com/jenkinsci/jenkins/pull/2042/files#diff-f65b8a70854ca1cc6c12397eee54d279R62) then it could no longer see build items to persist them.

timja commented 8 years ago

danielbeck:

For confirmation, same queue non-saving behavior should occur on 1.x when using matrix security that grants no permissions to anon. Right?

timja commented 8 years ago

svanoort:

danielbeck I have just confirmed that 1.x will fail to save queues where anonymous lacks read access, yes (when killed by Ctrl+C, of course). Surprised nobody has mentioned this issue yet...

timja commented 8 years ago

danielbeck:

svanoort Could you please determine whether this is a recent change? Which Jenkins 1.x fail to save the queue in this situation? Some ideas: 1.653/PR 2103; 1.638/SECURITY-186.

timja commented 8 years ago

svanoort:

danielbeck It is present 1.625.3 but not in 1.609.3, I don't have a specific commit that triggers it yet, though I do see that SECURITY-186 added permissions checking.

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Sam Van Oort
Path:
core/src/main/java/hudson/WebAppMain.java
http://jenkins-ci.org/commit/jenkins/543b947e79d862149a1e52a529c1944e53943b25
Log:
Fix JENKINS-34281 by running shutdown as system user

Replicates the key bit of dad9b04422d572003c83f0fc4543060a70971cc0
But does not include the likely-unneeded ACL.SYSTEM change before System.exit(0)

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Daniel Beck
Path:
core/src/main/java/hudson/WebAppMain.java
http://jenkins-ci.org/commit/jenkins/84698df80b055e15bdc8c1176167d858b10c513e
Log:
Merge pull request #2280 from svanoort/fix-shutdown-permissions-issue-JENKINS-34281-mk3

[FIX JENKINS-34281] Run shutdown as system user

Compare: https://github.com/jenkinsci/jenkins/compare/5d90f816566a...84698df80b05

timja commented 8 years ago

danielbeck:

Resolved towards 2.1.

timja commented 8 years ago

jglick:

I found another effect of this bug on Pipeline, unrelated to the queue:

java.io.IOException: no such WorkflowJob p
    at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.run(WorkflowRun.java:707)
    at org.jenkinsci.plugins.workflow.job.WorkflowRun$Owner.get(WorkflowRun.java:719)
    at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:63)
    at org.jenkinsci.plugins.workflow.flow.FlowExecutionList$1.computeNext(FlowExecutionList.java:55)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.suspendAll(CpsFlowExecution.java:890)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104)
    at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175)
    at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:282)
    at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:210)
    at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
    at jenkins.model.Jenkins$21.execute(Jenkins.java:3005)
    at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139)
    at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:128)
    at jenkins.model.Jenkins$21.execute(Jenkins.java:3005)
    at org.jvnet.hudson.reactor.Reactor$Node.runIfPossible(Reactor.java:139)
    at org.jvnet.hudson.reactor.Reactor.execute(Reactor.java:263)
    at jenkins.model.Jenkins._cleanUpRunTerminators(Jenkins.java:3002)
    at jenkins.model.Jenkins.cleanUp(Jenkins.java:2924)
    at hudson.WebAppMain.contextDestroyed(WebAppMain.java:373)
    at …

During shutdown with Ctrl-C in a system with no anonymous read access, $JENKINS_HOME/org.jenkinsci.plugins.workflow.flow.FlowExecutionList.xml may omit running builds.

timja commented 8 years ago

svanoort:

jglick So... one fix solved multiple bugs? Best thing to hear!

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java
http://jenkins-ci.org/commit/workflow-cps-plugin/d20d2b3f493d8bdda36e77e07c91d586e5789d7d
Log:
JENKINS-34281 It is possible for terminators to be called under anonymous access prior to 2.1, leading to lost FlowExecutionList entries.

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/cps/CpsFlowExecution.java
http://jenkins-ci.org/commit/workflow-cps-plugin/22d28c28e53448b08f750a9f7aa27bdee07dfc77
Log:
Merge pull request #19 from jglick/suspendAll-JENKINS-34281

JENKINS-34281 Lost FlowExecutionList entries

Compare: https://github.com/jenkinsci/workflow-cps-plugin/compare/865c21b72203...22d28c28e534

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/steps/SleepStep.java
http://jenkins-ci.org/commit/workflow-basic-steps-plugin/3da248c55c3bb81ae94f31038298d9a2beba368b
Log:
JENKINS-34281 Indicate if we are still sleeping after a resume.

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionList.java
http://jenkins-ci.org/commit/workflow-api-plugin/85c413b5fde6686d988c1bd2f1f763b583825fc3
Log:
JENKINS-34281 More logging about what happens when loading FlowExecutionList.

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/steps/SleepStep.java
http://jenkins-ci.org/commit/workflow-basic-steps-plugin/f79ae94a60cfcd62d9248e893c4a36573b7d1c61
Log:
Merge pull request #8 from jglick/sleep-info-JENKINS-34281

JENKINS-34281 Indicate if we are still sleeping after a resume

Compare: https://github.com/jenkinsci/workflow-basic-steps-plugin/compare/7360c8d48bdb...f79ae94a60cf

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/flow/FlowExecutionList.java
http://jenkins-ci.org/commit/workflow-api-plugin/f749920b37b7fcd3ecc1d5e85433c50a0a910cd4
Log:
Merge pull request #4 from jglick/FlowExecutionList-JENKINS-34281

JENKINS-34281 More logging about what happens when loading FlowExecutionList

Compare: https://github.com/jenkinsci/workflow-api-plugin/compare/ac98823111be...f749920b37b7

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java
http://jenkins-ci.org/commit/workflow-job-plugin/5c8499e19aae9a47207cde1072b1775230edf15b
Log:
JENKINS-34281 More general fix of CpsFlowExecution.suspendAll ACL bug which could help with InputAction.loadExecutions as well.

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java
http://jenkins-ci.org/commit/workflow-job-plugin/baf8f40be50d83ef1574f7470c399b2952bcb95b
Log:
Merge pull request #11 from jglick/Owner-ACL-JENKINS-34281

JENKINS-34281 More general fix of CpsFlowExecution.suspendAll ACL bug

Compare: https://github.com/jenkinsci/workflow-job-plugin/compare/e94fd55ee4f5...baf8f40be50d

timja commented 8 years ago

scm_issue_link:

Code changed in jenkins
User: Jesse Glick
Path:
job/src/main/java/org/jenkinsci/plugins/workflow/job/WorkflowRun.java
http://jenkins-ci.org/commit/pipeline-plugin/20ad6a2e677764ec3e38d81c3e058f0070591a8c
Log:
JENKINS-34281 More general fix of CpsFlowExecution.suspendAll ACL bug
Backports https://github.com/jenkinsci/workflow-job-plugin/pull/11.

timja commented 2 years ago

[Originally duplicated by: JENKINS-34256]

timja commented 2 years ago

[Originally duplicated by: JENKINS-35213]

timja commented 2 years ago

[Originally related to: JENKINS-30909]