Closed diregoblin closed 4 years ago
That sounds good. I'm not sure whether it would be better to just remove the graphical output (it's kind of a distraction from the MUSCLE part that we're trying to demonstrate anyway) or to make it optional, so that at least things don't crash if python-tk or an X server is missing. The latter would add more distraction still, so I'm tempted to remove it, but on the other hand, having a nice picture on your screen is also nice...
About the backtrace, while it's not perfect, I'm not that unhappy with it actually:
_tkinter.TclError: couldn't connect to display "node-10n:10.0"
shows that the problem is in connecting to the X server (although that's probably not obvious for most people)micro
instance quit because a peer disappeared. There should be a better message than just EOFError
there. I'm not sure if there is one in the log file actually.Did you look at the log files? Are they helpful here?
Not really, I simply ran ssh -X
when I saw that :)
I've repeated the issue, and the logs aren't very helpful here. Manager only contains the basic startup (I think similar to the normal run?), and macro & micro are empty.
Maybe you can keep two examples, one with graphic output and one without?
Right, those logs are useless. I need to catch that exception in Instance.reuse_instance()
, and log an error before quitting. Or even better, try to reconnect, in case it's just a dropped connection. But since those are rare on HPC machines, I haven't implemented that yet.
I like having two examples. And maybe leaving it as an exercise to the reader to swap out the implementation. Although I guess most readers would just comment out the plotting lines in the original...
Okay, I've actually kept a single example, but made it possible to disable plotting by defining an environment variable. That should make things work on text-only machines. See commit 1c72f98.
I'll make a separate issue for better error messages when a peer disappears. I'll probably pick that up when I implement automatic start-up of components via a pilot job framework, as I need to consider life cycle then anyway, and right now I need to get Fortran support released.
Better errors when a peer disappears is now #31. Closing this one, as the X issue is fixed. Please reopen if the fix does not work for you.
The current python example is nice, but it doesn't run correctly if you don't have the X environment (e.g. when using ssh with no X forwarding).
For the record, in that particular case it produced the following (rather ugly) log:
Enabling X11 forwarding fixes this issue, of course. However, having a basic command-line-only example for testing on remote machines would be nice, I guess.