rootfs / node-fencing

Apache License 2.0
5 stars 6 forks source link

Handling handleNodeFenceError #28

Open bronhaim opened 6 years ago

bronhaim commented 6 years ago

after reaching giveup retries we need to retry triggering all step jobs. giveup retries are raised each jobs polling if still not done successfully

rgolangh commented 6 years ago

btw, how do you keep jobs from colliding? i.e a job to fence a host, and a job to un-fence it? (or maybe cordon or uncordon is a better term)

bronhaim commented 6 years ago

the fence and un-fence scripts are different and not related to each-other. each script can do the opposite action - I can't prevent it. The admin sets the methods to run in each step, such declaration that can cause collision is less likely

bronhaim commented 6 years ago

once job failed we need to set anti-affinity to avoid running job on same node.