nodejs / build

Better build and test infra for Node.
506 stars 166 forks source link

`No valid crumb was included in the request` error trying to save job configurations #1852

Closed richardlau closed 5 years ago

richardlau commented 5 years ago

Edit: @sam-github also reports the same error trying to edit https://ci.nodejs.org/job/node-stress-single-test/configure

Getting the following error trying to save changes to https://ci.nodejs.org/job/nodereport-continuous-integration/:

HTTP ERROR 403

Problem accessing /job/nodereport-continuous-integration/configSubmit. Reason:

    No valid crumb was included in the request

image

Seems to be this job -- I'm able to edit and save the citgm-smoker-* jobs I also have access to without errors.

Probably related is that in the configure tab I can see the groups, e.g. nodejs*postmortem-admins as the page is loading but once loaded they become ERROR: image

image

Also strangely the job configuration history page (https://ci.nodejs.org/job/nodereport-continuous-integration/jobConfigHistory/) doesn't list the edits I know I made to this job towards the end of last year (https://github.com/nodejs/build/issues/1421): image

mhdawson commented 5 years ago

@rvagg @jbergstroem did you setup the integration between jenkins and our github groups? Looks like there is something wrong on that front?

mhdawson commented 5 years ago

Hmm, seemed to be ok for the N-API testing jobs. Will try to re-add Richard and see what happens.

mhdawson commented 5 years ago

I don't seem to be able to save any changes for that job :(

richardlau commented 5 years ago

Also if I expand the ERROR in the job configuration I get: image

mhdawson commented 5 years ago

Same error even when trying to configure a copy of the job.

mhdawson commented 5 years ago

Richard if I create a completely new job can you then copy over what's required for the job?

richardlau commented 5 years ago

Richard if I create a completely new job can you then copy over what's required for the job?

I think I can mostly but I might need a Jenkins admin to allow the versionSelector groovy script.

mhdawson commented 5 years ago

This is the new job https://ci.nodejs.org/job/nodereport-continuous-integration-new/

If you can configure what you can and then let me know if I need to do something for the versionSelector groovy script

richardlau commented 5 years ago

This is the new job https://ci.nodejs.org/job/nodereport-continuous-integration-new/

If you can configure what you can and then let me know if I need to do something for the versionSelector groovy script

@mhdawson I don't appear to have permission to edit the job configuration: image

I am in post-mortem-admins: https://github.com/orgs/nodejs/teams/post-mortem-admins/members

sam-github commented 5 years ago

I am having a similar issue with https://ci.nodejs.org/job/node-stress-single-test, I can't save configuration changes, same error about "no valid crumb".

mhdawson commented 5 years ago

Sorry seems I had mispelled the post-mortem-admins group name. Can you try now.

mhdawson commented 5 years ago

Took me a few tries to get it right. Hopefully good now.

sam-github commented 5 years ago

Nothing changed, still lacking crumbs on https://ci.nodejs.org/job/node-stress-single-test/configSubmit

I tried to logout, just to see if I need to logout and log back in to get the new perms, but logging out seems not to work.

richardlau commented 5 years ago

Sorry seems I had mispelled the post-mortem-admins group name. Can you try now.

I can edit and save now, but I ran once into the "no valid crumb" error on this job too (but was able to reload the page and then save which I've not been able to do for the original job). It looks like the new job isn't a matrix configuration job? It's missing the "Configuration matrix" section where we select the labels for the platforms to test on.

sam-github commented 5 years ago

https://ci.nodejs.org/job/node-stress-single-test/configure has the same problem, its like the job config got wiped, the label section that I want to add centos7-ppcle to aren't there anymore.

@richardlau I hope I'm not hijacking your thread! :-) I think its the same issue.

richardlau commented 5 years ago

@richardlau I hope I'm not hijacking your thread! :-) I think its the same issue.

That's fine. I'm of the same opinion that it looks like the same issue. My suspicion is that a Jenkins update has somehow broken something.

mhdawson commented 5 years ago

@richardlau if you need me to do something with respect to the job please reach out to me through internal slack.

richardlau commented 5 years ago

https://ci.nodejs.org/job/node-stress-single-test/configure has the same problem, its like the job config got wiped, the label section that I want to add centos7-ppcle to aren't there anymore.

@richardlau I hope I'm not hijacking your thread! :-) I think its the same issue.

Job configuration being wiped I think is an even more serious issue. @Trott noted in #node-build on irc that the job has lost its "Build with parameters". I also appear to have lost permission to edit that job (I had it before via https://github.com/nodejs/build/issues/1582).

rvagg commented 5 years ago

I don't know if this will make a difference but I've just upgraded and restarted Jenkins

richardlau commented 5 years ago

I don't know if this will make a difference but I've just upgraded and restarted Jenkins

Unfortunately no difference.

sam-github commented 5 years ago

@nodejs/build Configuration for https://ci.nodejs.org/job/node-stress-single-test/ appears to be totally wiped. Do we have backups?

sam-github commented 5 years ago

@jbergstroem :point_up: Rumour has it you setup a jenkins backup cron job, do you know anything about it?

rvagg commented 5 years ago

We have a backup on infra-joyent-smartos15-x64-1, but I don't believe it contains anything that ci.nodejs.org doesn't already have.

https://ci.nodejs.org/job/node-stress-single-test/jobConfigHistory/ should have all the information required. Most notably the first change from Refael's original to Richard's on the 25th that deleted most of the config: https://ci.nodejs.org/job/node-stress-single-test/jobConfigHistory/showDiffFiles?timestamp1=2019-05-30_20-19-21&timestamp2=2019-06-25_18-22-55

I did a restore of the original one and it came back fine. But then did a save and got the crumb error. Also this:

Screenshot 2019-06-29 21 22 10

The first one with my name is the restore, the second is my save, the next "unknown"s follow the same pattern from both Richard's save and Sam's saves on this job which are also followed by "unknown"s. And now Refael's original is gone from the list.

But the original, /var/lib/jenkins/config-history/jobs/node-stress-single-test/2019-05-30_20-19-21/config.xml is still on the server. So I've copied that manually back into /var/lib/jenkins/jobs/node-stress-single-test/config.xml and it's now back to original state again https://ci.nodejs.org/job/node-stress-single-test/configure but I dare not save it.

I've just run plugin updates, I went through a few changelogs for the ones that needed updating but couldn't see anything interesting.

The best suggestion I have is to try and manually rebuild this job from scratch and see if we can end up with a stable configuration? Perhaps there's something in the config that causes it to go bad (logs don't say anything interesting btw).

richardlau commented 5 years ago

@richardlau if you need me to do something with respect to the job please reach out to me through internal slack.

Spoke to Michael about the new https://ci.nodejs.org/job/nodereport-continuous-integration-new/ job not being a multi-configuration job. Another new job https://ci.nodejs.org/job/nodereport-continuous-integration-latest/, which is a multi-configuration job, has been created and I've manually copied across the details from https://ci.nodejs.org/job/nodereport-continuous-integration/.

Test runs are green against: Node.js 13 nightly: https://ci.nodejs.org/job/nodereport-continuous-integration-latest/12/ Node.js 8: https://ci.nodejs.org/job/nodereport-continuous-integration-latest/14/

I'll kick off 10 and 12 runs later but it looks like the new job (so far) is working as expected and is editable (unlike the existing https://ci.nodejs.org/job/nodereport-continuous-integration/). So recreating the job from scratch may also work for https://ci.nodejs.org/job/node-stress-single-test (the one Sam was having issues with).

I've deleted the unused https://ci.nodejs.org/job/nodereport-continuous-integration-new/.

richardlau commented 5 years ago

Michael has added https://ci.nodejs.org/job/nodereport-continuous-integration-latest/ to https://ci.nodejs.org/view/post-mortem/ (🙇) and I've run the job quite a few times now on node-report's master branch and a PR and all looks well. I'll delete the non-saveable https://ci.nodejs.org/job/nodereport-continuous-integration/ job in a few days and then I think we're good for the node-report job.

The stress job will need to be rebuilt.

mhdawson commented 5 years ago

I think @sam-github had been working to get the backup so that we could restore the stress job from that.

richardlau commented 5 years ago

I think @sam-github had been working to get the backup so that we could restore the stress job from that.

Isn't that what Rod attempted in https://github.com/nodejs/build/issues/1852#issuecomment-506950690?

sam-github commented 5 years ago

@mhdawson doing as Rod suggests in https://github.com/nodejs/build/issues/1852#issuecomment-506950690 is definitely on my TODO list, but has not made it high enough for me to work on it yet.

sam-github commented 5 years ago

Good news:

  1. I rebuilt node-stress-single-test, and I can save it, and it has no "crumb" errors
  2. I ran a ppcle-ubuntu1404 build, it passed
  3. I added centos7-ppcle, its running, I expect it to pass

Bad news:

  1. The job config history is bizarre, I don't understand why there are two changes by "unknown" almost every time I save a change. Check it out at https://ci.nodejs.org/job/node-stress-single-test-sr/jobConfigHistory/ First change shows no xml diff, second adds some metadata that I don't understand, and weirdly, when I save it again, both the changes I make are saved... and also the auto-added metadata is removed...until the gremlin aka "unknown" adds it back in. Is this normal Jenkins operation? Is it a problem? I've no idea
  2. I don't know how to replicate the project-based security settings. https://ci.nodejs.org/job/node-stress-single-test/configure has a line for @Trott , and then an ERROR line, so I don't know what it used to be. In my new job, I added specific perms for just me, sam-github, with same perms as Trott. Weirdly, the first time I did that, I got the dreaded "crumb" error. I was pretty sure the job was borked, but the line for me wasn't actually saved. I added it again, sucess, no crumb errors since. Still, I don't understand why there are specific config lines for users, I didn't need specific config perms to edit the job, I don't know why anyone else would need specific perms either. What should I do here?

I guess I'll wait a day, and then rename the original job to -old, and rename my -sr job to replace the original, unless anyone has any comments here.

sam-github commented 5 years ago

@richardlau I think enough time has passed for you to do the renaming you describe in https://github.com/nodejs/build/issues/1852#issuecomment-508538778

richardlau commented 5 years ago

@richardlau I think enough time has passed for you to do the renaming you describe in #1852 (comment)

Thanks for the reminder. https://ci.nodejs.org/job/nodereport-continuous-integration/ is now gone!

richardlau commented 5 years ago

Good news:

1. I rebuilt node-stress-single-test, and I can save it, and it has no "crumb" errors

2. I ran a ppcle-ubuntu1404 build, it passed

3. I added centos7-ppcle, its running, I expect it to pass

Bad news:

1. The job config history is bizarre, I don't understand why there are two changes by "unknown" almost every time I save a change. Check it out at https://ci.nodejs.org/job/node-stress-single-test-sr/jobConfigHistory/  First change shows no xml diff, second adds some metadata that I don't understand, and weirdly, when I save it again, both the changes I make are saved... and also the auto-added metadata is removed...until the gremlin aka "unknown" adds it back in. Is this normal Jenkins operation? Is it a problem? I've no idea

I think it's something new-ish (and possibly related to the issue at hand) and looks like it's what ends up in https://ci.nodejs.org/job/node-stress-single-test-sr/metadata/.

2. I don't know how to replicate the `project-based security` settings. https://ci.nodejs.org/job/node-stress-single-test/configure has a line for @Trott , and then an ERROR line, so I don't know what it used to be. In my new job, I added specific perms for just me, sam-github, with same perms as Trott. Weirdly, the first time I did that, I got the dreaded "crumb" error. I was pretty sure the job was borked, but the line for me wasn't actually saved. I added it again, sucess, no crumb errors since. Still, I don't understand why there are specific config lines for users, I didn't need specific config perms to edit the job, I don't know why anyone else would need specific perms either. What should I do here?

I see this as the page is loading before my entry gets replaced with ERROR. image

I wonder if individual people being included (as opposed to teams) is a commonality between the jobs that exhibited the No valid crumb was included in the request error. I was added to the job as an individual so that I could edit it (https://github.com/nodejs/build/issues/1582) as there are no teams for that job with edit permissions that I could join (and I otherwise don't have general job editing permissions as I'm not a Jenkins admin).

sam-github commented 5 years ago

I wonder if individual people being included (as opposed to teams) is a commonality between the jobs that exhibited the No valid crumb was included in the request error.

I share that suspicion, though not grounded on much. @richardlau You should just ask to join the WG, you do plenty of work on build, and the access would be useful to you, clearly.

rvagg commented 5 years ago

yeah, @richardlau you should totally join Build, you're one of the most active people here!

sam-github commented 5 years ago

fwiw, I archived the original stress-test job (and added a link from its description to here), and renamed my -sr one to the base name.