timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-24836] Ability to fail a build when no executors are available #5504

Open timja opened 10 years ago

timja commented 10 years ago

When using Jenkins to run automatic tasks, it's almost foolproof. Unfotunately, there's a loophole: if a slave goes offline, the job will just stay in waiting state. It's not acceptable for critical tasks (backups, etc). They should either run in time or fail.

Thus, this feature request. Please add an option to fail a build when it cannot run.


Originally reported by brein12, imported from: Ability to fail a build when no executors are available
  • status: Open
  • priority: Major
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 10 years ago

danielbeck:

There is no build to fail if there's no executor available to run on. Please explain in more detail what you are asking for.

Wouldn't it be more useful to have a "queue watcher" and if certain items are queued longer than a certain timeout, send a notification?

timja commented 10 years ago

brein12:

Not sure what do you mean by "there's no build to fail". It's right there, in queue, "job #12345", waiting for the next available executor.
Anyway, I don't really see a difference. I'm asking for a way to be sure that either the job runs in time or I get a notification that it didn't. I guessed failing it would be the natural way. "Queue watcher" would do just as well, I suppose.

timja commented 10 years ago

danielbeck:

Before a build is started by starting execution on an executor, there is no build, just a queue item. Those cannot have a result like builds, and nothing about them is recorded either. If you cancel one, no "failed build" or "canceled build" entry is created, and no build numbers are assigned either.

Is node being/going offline just an example, or is this the one condition that's relevant? If the latter, plugins like https://wiki.jenkins-ci.org/display/JENKINS/Extreme+Notification+Plugin or http://jenkins-enterprise.cloudbees.com/docs/user-guide-bundle/nodes-plus.html would solve this.

timja commented 10 years ago

brein12:

There are other conditions, but they can be worked around with plugins. This one cannot.
No, the plugins that you mentioned don't exactly solve the problem. The node status isn't relevant. Job status is.
(To elaborate, for example, the node might itself be launched only periodically. Or have some issues with connectivity. Or resources. Or something else. In that case using a plugin that notifies about node availability will result in a number of alerts, which will render the notification system useless).

timja commented 10 years ago

danielbeck:

Please explain other conditions etc. so I get a more complete picture what this is about. It may be possible to implement fairly easily.

timja commented 10 years ago

brein12:

Well, generally a vanilla Jenkins job may not work properly (and don't notify about it) due to the following reasons (I'm referring to bash here):

1) The script isn't catching all errors. Jenkins launches the scripts with "-xe" options by default. A more robust way is "-xeu -o pipefail". This is trivial to do.
2) Not all commands return correct exit code on failure. "Log parser" plugin often helps here.
3) The node isn't available at the time, as described in this issue. Right now I'm using https://wiki.jenkins-ci.org/display/JENKINS/Monitor+and+Restart+Offline+Slaves, but it's not enough.
4) Some step hangs, so the build is never completed and the next one is in queue forever. Fortunately, there's a "Build timeout" plugin for that.

I think that's all.