Open berghaus opened 7 years ago
feasible, maybe. Just as we use condor_off, there's a condor_on command. I played around with it a bit some time ago. There are a few issues with it that I remember. The main one being that once you condor_off'd something, if you reissued a condor_on it had to finish the off first. So there was a gap when it would de-register so CS would need extra logic to know that machine is supposed to come back so it doesn't hit race conditions thinking the machine has failed to register or something else happened to it.
Sounds like avoiding those conditions will need some tricky logic. I think this is a worthwhile idea, but we should not make it a high priority. What do you think?
Sounds good. Some of the old code for doing the condor_on can probably be resurrected and updated, but all the logic would be new.
In a situation where jobs asking for resources which are currently in retiring state, would it be possible to reactivate that resource. The current behavior is to shut down and then wait for a new VM to come up certainly works, but wastes some amount of time/resources.