hep-gc / cloud-scheduler

Automatically boot VMs for your HTC jobs
http://cloudscheduler.org
Apache License 2.0
3 stars 0 forks source link

Allow un-Retiring VMs? #462

Open berghaus opened 7 years ago

berghaus commented 7 years ago

In a situation where jobs asking for resources which are currently in retiring state, would it be possible to reactivate that resource. The current behavior is to shut down and then wait for a new VM to come up certainly works, but wastes some amount of time/resources.

mhpx commented 7 years ago

feasible, maybe. Just as we use condor_off, there's a condor_on command. I played around with it a bit some time ago. There are a few issues with it that I remember. The main one being that once you condor_off'd something, if you reissued a condor_on it had to finish the off first. So there was a gap when it would de-register so CS would need extra logic to know that machine is supposed to come back so it doesn't hit race conditions thinking the machine has failed to register or something else happened to it.

berghaus commented 7 years ago

Sounds like avoiding those conditions will need some tricky logic. I think this is a worthwhile idea, but we should not make it a high priority. What do you think?

mhpx commented 7 years ago

Sounds good. Some of the old code for doing the condor_on can probably be resurrected and updated, but all the logic would be new.