tango-controls / cppTango

Moved to gitlab
http://tango-controls.org
41 stars 34 forks source link

Crash in local call #328

Closed tango-controls-bot closed 7 years ago

tango-controls-bot commented 8 years ago

Hello. We've stumbled upon another crash that happens with a local-call from a thread - not unlike my previous bug (823). I applied the patch to blackbox.cpp, but now I get a crash that does not go through the blackbox:

#0 0x00007ffff1e1ce62 in omni_thread::get_value (this=0x0, k=2) at threaddata.cc:71
#1 0x00007ffff3e9defa in Tango::DeviceImpl::get_client_ident (this=<optimized out>) at device.cpp:4416
#2 0x00007ffff3ea5255 in Tango::DeviceImpl::check_lock (this=this@entry=0x2007850, meth=meth@entry=0x7ffff40991a5 "command_inout4",
cmd=cmd@entry=0x7fff94005eb8 "Stop") at device.cpp:4820
#3 0x00007ffff3ed3ee2 in Tango::Device_4Impl::command_inout_4 (this=0x2007850, in_cmd=0x7fff94005eb8 "Stop", in_data=..., source=Tango::CACHE_DEV, cl_id=...)
at device_4.cpp:467
#4 0x00007ffff404f342 in _0RL_lcfn_6fe2f94a21a10053_a3000000 (cd=0x7fffa2ff8dc0, svnt=<optimized out>) at tangoSK.cpp:5383
#5 0x00007ffff38bd333 in doLocalCall (servant=<optimized out>, this=0x7fffa2ff8dc0) at ../../../../include/omniORB4/callDescriptor.h:145
#6 omni::omniOrbPOA::dispatch (this=<optimized out>, call_desc=..., id=0x26924b0) at poa.cc:1852
#7 0x00007ffff389e6ee in omniLocalIdentity::dispatch (this=0x26924b0, call_desc=...) at localIdentity.cc:145
------------SNIP--------------
#142 0x00007ffff78180db in start_thread () from /lib64/libpthread.so.0
#143 0x00007ffff7548e3d in clone () from /lib64/libc.so.6

Johan here https://sourceforge.net/p/tango-cs/bugs/814/#da32 seems to have the same problem.

Reported by: schneidemwe ( http://sf.net/u/schneidemwe )

Original Ticket: tango-cs/bugs/827

tango-controls-bot commented 7 years ago

Hi Marius,

As you have guessed, this problem is similar to the one you had in bug 823. It happens because the thread you create is not a omni_thread. Before C++11, when users wanted to create a thread within their Tango class, they were using omni_thread, With C++11 and the thread class, it is very easy to create your own thread which is not a omni_thread. The crash happened because Tango assumes that the thread is one omni_thread and it is not. If it was relatively easy to solve in the blackbox, it is not the case here. But there is a workaround. You can give omni_thread knowledge of threads that it has not created. To do so, as the first line of your thread code, add line

omni_thread::ensure_self es;

and keep this object alive until the end of the thread. This should solve your problem. BTW, next major Tango release will not be based on omni_thread and therefore, this kind of problems will disapear.

Hoping this help

Original comment by: taurel (http://sourceforge.net/u/taurel)

tango-controls-bot commented 7 years ago

Unfortunatly, these are not threads I have any control over. It happens with sardana/spock for e.g.: defctrl DummyMotorController testctrl

Can you not provide a hot-fix, e.g. creating the omni_thread environment on-demand before accessing it? Another possible fix would be to disable the local-calls, if that is in any way possible. I.e. making sure that everything goes through the network and only tango's own threads process it.

We did not observe this behaviour on any new (C++11) code yet. So far, it has only been older code that was working happily with tango 8.

Original comment by: schneidemwe (http://sourceforge.net/u/schneidemwe)

tango-controls-bot commented 7 years ago

Hi,

I confirm this bug is not there in Tango 8. This was introduced by the following change in the blackbox..cpp file: https://sourceforge.net/p/tango-cs/code/28946/tree//api/cpp/cppapi/branches/Tango_900/server/blackbox.cpp?diff=28853

A quick and dirty fix is to remove the following 2 last lines at the end of BlackBox::get_client_host() method.

if (dummy == true)
         omni_thread::release_dummy();

BUT please be aware that this will re-introduce the memory leak (120 bytes per custom thread created if this thread is not an omnithread) which was present in Tango 8.

Hoping this helps in the mean time.

Kind regards, Reynald

Original comment by: bourtemb (http://sourceforge.net/u/bourtemb)

tango-controls-bot commented 7 years ago

Just to understand better the problem: does the Pool/Sardana DS also crash if you create the controller instance using directly the command on the Pool device e.g.

import PyTango

pool = PyTango.DeviceProxy(<pool name>)
argin = ['Motor', 'DummyMotorController', 'DummyMotorController', 'testctrl']
pool.CreateController(argin)

At Alba we are using Sardana with Tango 9 in one of the beamlines and they have not suffered this problem.

Original comment by: zreszela (http://sourceforge.net/u/zreszela)

tango-controls-bot commented 7 years ago

Hi Zibi, No this version works for us as well as it is not in the same thread. The problem only occurs if we run inside of spock for example the command

defctrl DummyMotorController testctrl

Or if we use threads in our controller eg. To stop a motion… just using the same code as was working for TANGO 8. In case of Tango 9 the Sardana door crashes with the “segmentation fault”.

we are currently building the bugfix as suggested by Reynald

Best regards, Doris

Original comment by: ressmann (http://sourceforge.net/u/ressmann)

tango-controls-bot commented 7 years ago

Some more questions:

If you write a new macro and put there the code from my previous message and execute it, does it also crash?

Meanwhile let's hope that the workaround proposed be Reynald will work for you.

Original comment by: zreszela (http://sourceforge.net/u/zreszela)

tango-controls-bot commented 7 years ago

We just have a single Sardana server.

If I execute RunMacro I have the same behaviour Sardana crashes with a Segmentation fault. This is the output in Jive:

Command: Door/xspec/1/RunMacro
Duration: 306 msec
Argin: "defctrl","DummyMotorController","testctrl2"
Output argument(s) :
array length: 1
[0]  <sequence><macro name="defctrl" id="-1"><param name="class" value="DummyMotorController"/><param name="name" value="testctrl2"/><paramrepeat name="roles_props"/></macro></sequence>

Our Build Server had a hardware crash as such I could not yet test the workaround...

Original comment by: ressmann (http://sourceforge.net/u/ressmann)

tango-controls-bot commented 7 years ago

Good news: the workaround helps a lot.

No crash any more. Thank you so much

Original comment by: ressmann (http://sourceforge.net/u/ressmann)

bourtemb commented 7 years ago

Problem solved in Tango 9.2.5, even though the solution re-introduced a small memory leak.