taurus-org / taurus

Moved to https://gitlab.com/taurus-org/taurus
http://taurus-scada.org
43 stars 46 forks source link

crash in attribute after tango DS restart (when events are blocked) #1118

Closed cpascual closed 4 years ago

cpascual commented 4 years ago

Didn't have time to investigate, but here is how to reproduce it (100% of the times in my case, with pytango 9.3.1, tango 9.3.2):

  1. Start tangotest DS
  2. launch this client:
    from taurus.qt.qtgui.application import TaurusApplication
    from taurus.qt.qtgui.input import TaurusValueLineEdit
    import sys
    app = TaurusApplication(cmd_line_parser=None)
    w = TaurusValueLineEdit()
    w.setModel("sys/tg_test/1/short_scalar")
    w.show()
    sys.exit(app.exec_())
  3. stop the tangotest DS (the line edit receives an error event and gets disabled, as expected)
  4. start the tangotest DS (the line edit receives a valid event and gets enabled, as expected)
  5. wait a few seconds (~10?) and the line edit crashes

Maybe this is related to #tango-controls/cppTango#686 ?

cpascual commented 4 years ago

Maybe this is related to #tango-controls/cppTango#686 ?

It does not seem to be the case: I just tried with pytango9.3.2 and cpptango9.3.4-rc6 and it can still be reproduced

cpascual commented 4 years ago

I am able to reproduce the problem without Qt, just using taurus.Attribute (that's why I changed the issue title):

  1. Start tangotest DS
  2. launch this script:
    import taurus
    import time
    a = taurus.Attribute("sys/tg_test/1/double_spectrum")
    while True:
      print('.', end='')
      time.sleep(1)
  3. stop and start the tangotest DS
  4. wait a few seconds (~10s) and the script crashes
cpascual commented 4 years ago

Note: If I substitute the taurus.Attribute line by one using just a taurus.Device or one using tango.AttributeProxy, the problem is not triggered (probably we need to subscribe, or read...)

cpascual commented 4 years ago

This snippet automatizes the whole test (including re-starting the tangotest DS):

cpascual commented 4 years ago

Here is some more info I gathered while investigating:

changing python version

The problem occurs for py 2.7 and py 3.7

(using pytango 9.3.2 , cpptango 9.3.4rc6)

Changing taurus version

The problem occurs for all taurus 4 versions (independent of py2 or py3). It does not occur for taurus 3.7.5

(using pytango 9.3.2 , cpptango 9.3.4rc6)

changing (py)tango versions

cpascual commented 4 years ago

... and here is the backtrace when reproducing this within gdb:

Thread 9 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd243a700 (LWP 27381)]
0x00007fffd3c5bf7e in Tango::EventConsumerKeepAliveThread::reconnect_to_zmq_channel(std::_Rb_tree_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Tango::channel_struct> >&, Tango::EventConsumer*, Tango::DeviceData&) () from /home/cpascual/miniconda/envs/kk3/lib/python3.7/site-packages/tango/../../../libtango.so.9
(gdb) bt
#0  0x00007fffd3c5bf7e in Tango::EventConsumerKeepAliveThread::reconnect_to_zmq_channel(std::_Rb_tree_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Tango::channel_struct> >&, Tango::EventConsumer*, Tango::DeviceData&) () from /home/cpascual/miniconda/envs/kk3/lib/python3.7/site-packages/tango/../../../libtango.so.9
#1  0x00007fffd3c5ef84 in Tango::EventConsumerKeepAliveThread::main_reconnect(Tango::ZmqEventConsumer*, Tango::NotifdEventConsumer*, std::_Rb_tree_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Tango::event_callback> >&, std::_Rb_tree_iterator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Tango::channel_struct> >&) ()
   from /home/cpascual/miniconda/envs/kk3/lib/python3.7/site-packages/tango/../../../libtango.so.9
#2  0x00007fffd3c6073e in Tango::EventConsumerKeepAliveThread::run_undetached(void*) () from /home/cpascual/miniconda/envs/kk3/lib/python3.7/site-packages/tango/../../../libtango.so.9
#3  0x00007fffd3494e48 in omni_thread_wrapper () from /home/cpascual/miniconda/envs/kk3/lib/python3.7/site-packages/tango/../../.././libomnithread.so.4
#4  0x00007ffff7f7dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#5  0x00007ffff7eae4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) 
cpascual commented 4 years ago

From the backtrace, I am (wildly) wondering if this is somehow related to tango-controls/pytango#302 or tango-controls/cppTango#316 ... but I am quite lost... can someone help?

cpascual commented 4 years ago

Ok, I finally managed to reproduce it with just PyTango and I reported an issue there:

https://github.com/tango-controls/pytango/issues/371

cpascual commented 4 years ago

...now I wonder if there is some workaround that we could implement to avoid this until it is fixed

Any idea?

cpascual commented 4 years ago

Update: The problem seems to be related with events not reaching the client (see https://github.com/tango-controls/pytango/issues/371#issuecomment-656246236)

cpascual commented 4 years ago

Given that the problem is less serious than originally thought, that the fix should be done in (py)tango and that #1125 provides a workaround, we can close this issue once #1125 is merged.