mozilla / snakepit

Machine learning job scheduler
Mozilla Public License 2.0
50 stars 16 forks source link

Allow job specified cleanup script #178

Open reuben opened 4 years ago

reuben commented 4 years ago

Doing things inside /data/rw/pit in interactive jobs is very painful because of sshfs, a git status can take tens of seconds to complete. But doing things outside of the snakepit mounts means risking losing data if something goes wrong unexpectedly and your job gets stopped/killed, or if you stop it and forget to copy things first.

If we could provide a cleanup script that is executed as part of job end, then I could copy any critical folders (checkpoints, etc) into networked folders to make sure nothing is forgotten.

tilmankamp commented 4 years ago

How about something like this?

reuben commented 4 years ago

Would that work if I pit stop a job?

reuben commented 4 years ago

I'll play around with the suggestions in that SO and close the issue if they solve my case. In particular trap might be enough since it catches signals.

tilmankamp commented 4 years ago

Ah - you are right - I just had the failure case in mind. No, am pretty sure it won't execute.