Closed tango-controls-bot closed 8 years ago
Hi Tiago,
We have to be precise here. When you do a CTRL+Z in the window where the DS is running, you send the signal SIGTSTP (number 20 on my ubuntu) to the process. When you use the kill -stop command, you send the signal SIGSTOP (number 19 on my ubuntu) to the process. In both cases, the default action is to stop the process. This process will re-start when it will receive the SIGCONT signal (18 on my ubuntu). SIGSTOP is like SIGKILL, it cannot be caught, blocked or ignored... There is nothing we can do for this one. Between the signal SIGSTOP and SIGCONT, the process is stopped and obviously, if you try to access it from a client, you will have a timeout. We can ask the OS (at least Unix like OS) to ignore signals SIGTSTP and SIGCONT but is it really wanted? If we do this, there will be noway to put a process in background in you have started it in a shell window in foreground by typing CTRL+Z and bg.
Waiting for comments
Cheers
Manu
Original comment by: taurel
Hi Manu,
The detailed actions are:
you said that: "Between the signal SIGSTOP and SIGCONT, the process is stopped and obviously, if you try to access it from a client, you will have a timeout."
The problem is I DON'T have the timeout exception but instead the client waits forever. I also tried the same example with a DeviceProxy::state() and I get the same result.
I don't want to ignore the SIGTSTP in the device server. The situation we have here at ALBA is: some DS "hang" for some reason (they have bugs!). I wanted to create a tango diagnostic tool to check which DS on the machine were hanged. I am using the Ctrl-Z to simulate a "hang" in the DS. I was hoping that pinging a device in a DS which is "hanged" would trigger the tango timeout exception and I could report the DS as being "hanged". But since the ping() call does not return ever, the diagnostic tool is getting stuck in the first device which is "hanged".
However if the sequence of actions is:
in this case the ping returns with an exception. So, it appears that the problem happens only if the DeviceProxy object is created AFTER the DS is suspended.
Original comment by: tiagocoutinho
Original comment by: tiagocoutinho
Original comment by: tiagocoutinho
Hi Tiago,
I am able to re-produce this behavior in the following case:
1 - Client and server on the same host 2 - Server suspended by a CTRL^Z 3 - DeviceProxy creation followed by a ping() call
but only if the client (point 3) is started more or less one minute after the device server has been suspended.
There is a work-around for this problem. Define the environment variable ORBclientTransportRule="* tcp" for the client process. For me, it solves the problem. The drawback is that the unix socket transport will never be used (less performance for connection on the same host)
There is even a better solution: This behavior is due to a omniORB bug in connection establishment for unix socket transport. Duncan already sent me a patch for this bug. Apply this patch, re-build omniORB and it should solve the problem. It did on my ubuntu The patch file is attached to this bug report
Cheers
Manu
Original comment by: taurel
omniORB patch
Original comment by: taurel
Original comment by: tiagocoutinho
Hi Manu,
I checked and I also need about 1 minute after the DS is suspended to see the problem. Thank you very much for the proposed fixes. I will apply the omniORB patch. Since the bug is not in tango but omniORB I mark the bug as closed and with resolution None. (i don't want to mark it as deleted so that we don't loose the patches in attachment)
Original comment by: tiagocoutinho
Hello,
There is a situation where a tango client application will be hanged. You can reproduce this with either a C++ or Python client. To reproduce it:
start any device server (preferebly in linux)
do a Ctrl+Z to suspend the DS (or kill -stop <pid>)
write a tango client that something like this one: import PyTango d = PyTango.DeviceProxy(<name>) d.ping()
the ping call will never return.
Reported by: tiagocoutinho
Original Ticket: tango-cs/bugs/342