spacetelescope / imexam

imexam is a python tool for simple image examination, and plotting, with similar functionality to IRAF's imexamine
http://imexam.readthedocs.io
BSD 3-Clause "New" or "Revised" License
74 stars 45 forks source link

Setting XPA_METHOD=local is not the solution #162

Closed chris-simpson closed 5 years ago

chris-simpson commented 5 years ago

I have seen the following behaviour in Py3.6.8/imexam0.8.1 on CentOS7; Py2.7.15/imexam0.8.0 on the same OS, and Py2.7.14/imexam0.8.0 on MacOS.

terminal, no ds9 running on the machine, XPA_METHOD not set, I run the following script (it's important that it's run as a script from the command line and NOT as commands in an interactive session, for reasons that will become clear later):

!/usr/bin/env python

import numpy as np import imexam viewer = imexam.connect() viewer.view(np.ones((100,100)))

It crashes with the "xpa.XpaException: Unknown XPA Error : XPAGet returned 0!" error, to which your recommended fix is to set XPA_METHOD=local

So I do that and it works. But now I run this script where I call list_active_ds9():

!/usr/bin/env python

import numpy as np import imexam imexam.list_active_ds9() viewer = imexam.connect() viewer.view(np.ones((100,100)))

and it crashes with the same error as before. So I haven't really solved the problem. What does actually fix the problem is keeping XPA_METHOD unset but putting a sleep between the imexam.connect() and trying to display anything. So

!/usr/bin/env python

import numpy as np import imexam import time imexam.list_active_ds9() # this line is irrelevant viewer = imexam.connect() time.sleep(2) viewer.view(np.ones((100,100)))

A 2-second sleep seems to always work. A 1-second sleep seems to work about half the time. So if you type the commands into an interactive python session, you'll probably never see a problem because you have a sufficient delay between starting the ds9 process and displaying something. Similarly, a script that starts a ds9_viewer at the beginning and then spends a couple of seconds reading data before displaying it will also work.

I really don't understand the details (because I've never looked at any of this before last Friday) but I guess the default XPA_METHOD inet has a cycle time of 1-2 seconds and so that delay is required between starting the viewer and communicating with it. Whereas the "local" method everything happens pretty much instantaneously and the delay isn't required. Or something like that. I'm not really sure why the list_active_ds9() fouls things up but you have a comment in the code that it's only listening on the inet socket so there's probably some relationship.

I hope you see the same behaviour as me (because I nearly went mad trying to get reproduceable results), in which case it probably makes sense to force a 2-second sleep when starting a new ds9 process?

sosey commented 5 years ago

Hi Chris,

I'm usually interacting with imexam and ds9 through an interactive terminal session, so thank you for reporting the issue.

I'm not an expert, but inet sets up an IP communication socket over a network, local uses a unix file reference as the socket, localhost sets up up a more complete TCP protocol for communication. Each has different setup/communication timing limits, though in general unix local sockets should be faster than inet IP sockets during run, but perhaps not during setup. imexam has by default a flexible 10sec delay built into the connect(), this was intended to allow time for ds9 open, for X11/Xquartz to startup if required, and to establish the connection, though in my experience for interactive use the time actually needed is shorter.

imexam and ds9 itself should default to inet, local sockets are sometimes suggested when there is no internet connection and an IP address cannot be found or established for the computer. The issue with the server listening to inet only, only arises when there are already other ds9 processes existing that were started with inet communication. the XPA prefers all windows to use the same communication protocol.

I don't think that list_active_ds9() itself is causing issues, at least I haven't seen that. When I have no ds9 running, and X11 hasn't started I see it report no ds9 open as expected, and see the XPA communication issue:

No active sessions registered Traceback (most recent call last): File "test", line 7, in viewer.view(np.ones((100,100))) File "/Users/sosey/miniconda3/envs/jwstdev/lib/python3.6/site-packages/imexam-0.8.2.dev373-py3.6-macosx-10.7-x86_64.egg/imexam/connect.py", line 505, in view self.window.view(*args, **kwargs) File "/Users/sosey/miniconda3/envs/jwstdev/lib/python3.6/site-packages/imexam-0.8.2.dev373-py3.6-macosx-10.7-x86_64.egg/imexam/ds9_viewer.py", line 1718, in view frame = self.frame() File "/Users/sosey/miniconda3/envs/jwstdev/lib/python3.6/site-packages/imexam-0.8.2.dev373-py3.6-macosx-10.7-x86_64.egg/imexam/ds9_viewer.py", line 975, in frame frame = self.get("frame").strip() # xpa returns '\n' for no frame File "/Users/sosey/miniconda3/envs/jwstdev/lib/python3.6/site-packages/imexam-0.8.2.dev373-py3.6-macosx-10.7-x86_64.egg/imexam/ds9_viewer.py", line 693, in get return self.xpa.get(param) File "/Users/sosey/miniconda3/envs/jwstdev/lib/python3.6/site-packages/imexam-0.8.2.dev373-py3.6-macosx-10.7-x86_64.egg/imexam/xpa_wrap.py", line 14, in get return super(XPA, self).get(param.encode('utf-8', 'strict')).decode() File "wrappers/xpa.pyx", line 172, in xpa.xpa.get File "wrappers/xpa.pyx", line 112, in xpa._get xpa.XpaException: Unknown XPA Error : XPAGet returned 0!

The communication error arrises because when you are running the commands non-interactively, they are executed without the wait that the interactive shell injects. As long as an x11 server is already started and running, the 2-3 second sleep does seem to allow enough time for all the connections to establish. If the X11 also needs to startup, then it looks to require a few seconds more. I'm guessing this might be another factor you're running into. The specific error is reported because I have several methods to internally try and track changes to ds9 that the user makes through the DS9 gui itself, instead of the imexam api. The first time inet initializes, it may also fail because ds9 will ask to approve incoming connections through a dialog window. Once that's approved, a second call should run smoothly.

I would like to avoid adding too many delays into the regular calls because it will slow down the interactive users, but I think your sleep solution is good for non-interactive users. When I get the chance, I'll see if there's an XPA class I can wrap to check the establishment of the socket, the xpa itself has some delays incorporated for socket and data communication delays. I'm already checking for creating of the unix local socket filename, but that could be out of sync with the full communication socket establishment.

In the meantime, I'll clarify the documentation about when to use local, inet, and localhost connections, as well as sleeps for interactive users.

I'm not sure what your ultimate goal is. If you're coding a quicklook, adding in a wait so the user has time to look at the image before the processes are killed is good. If you only care about results from specific imexam commands on objects, you could use it without displaying to ds9 and just save the plots or images, which can be displayed in sequence or checked at will. If the source object locations are unknown ahead of time, you could add a photutil call to detect them and return the locations to imexam for plotting.

cheers, megan

prdurrell commented 5 years ago

This thread has helped me, as I am having the same issues...I too have am having troubles getting viewer = imexam.connect() to work properly (eg. to display an image after starting DS9) without a time.sleep(2). However, my goal is to use imexamine interactively, so I would be interested to know if there is a workaround for this. Or did I miss any new documentation on these?

MacOS El Capitan, DS9 7.6, imexam v0.8.2

sosey commented 5 years ago

@prdurrell are you trying to use imexam interactively or calling it from a script?

sosey commented 5 years ago

If you could open a new issue and paste in the output you are seeing there with your use case it would be helpful, we can link these two issues if they turn out to have a similar root cause - thanks!