spotty-cloud / spotty

Training deep learning models on AWS and GCP instances
https://spotty.cloud
MIT License
491 stars 43 forks source link

Pane is dead! #92

Closed aeon0 closed 3 years ago

aeon0 commented 3 years ago

I am often having the issue that I either did something wrong in my yml file or for example I run tensorboard, then I exit it cause I think I don't need it anymore, and then I realize I should have kept it open.

But in all these cases I get "Pane is dead" when I run e.g. spotty run tensorboard or spotty run train for the second time after I exited out of an error occured. I have the backup to do spotty sh but then I can not run training and tensorboard in parallel.

Is there a way to "restart" that command?

himat commented 3 years ago

Agreed. The worse part I've found though is that if I do spotty sh and then exit that tmux window, and then do spotty sh again, I get the "Pane is dead" message! I can deal with the Pane is dead showing up if I exit a spotty run tensorboard, but I can't even sh into the machine anymore if I exit from spotty sh, so it seems like I'm forced to restart the entire machine to get sh access again.

apls777 commented 3 years ago

when I run e.g. spotty run tensorboard or spotty run train for the second time after I exited out of an error occurred.

You need to kill the pane with the Crtl+b, then x combination of keys before running the same command again.

Is there a way to "restart" that command?

In the older versions of Spotty, there was a flag in the spotty run command to restart the script if it's already running, but I removed this functionality as it was causing another problem. If there is a process running inside the container that runs inside a tmux window, and you kill that tmux window, the process inside Docker will still be running. So, this way, you would end up running 2 copies of your script in parallel. I didn't find an easy solution to this problem.

if I do spotty sh and then exit that tmux window, and then do spotty sh again, I get the "Pane is dead" message!

Instead of using the "exit" command, I would recommend using the Crtl+b, then d combination of keys to detach your tmux session, or Crtl+b, then x to kill the pane.