rhettg / Tron

Next generation batch process scheduling and management
Other
11 stars 0 forks source link

Gracefully handle SSH / server disconnects #4

Closed rhettg closed 14 years ago

rhettg commented 14 years ago

Tron handles running a job by SSHing into a machine and waiting for it to complete.

I have no idea what it will do if the machine disconnected.

At a minimum, we should mark the job run as some 'unknown' state and optionally leave the job in a state where it can try again once the machine comes back.

rhettg commented 14 years ago

After much work and twisted pain, ssh connection failures are handled fairly gracefully.

There isn't much in the way of tests for this, mostly it was manual work with a vm host just to see what twisted did. Might not be reasonable to add much unit testing for this, I"ll have to think about it more.