taurus-org / taurus

Moved to https://gitlab.com/taurus-org/taurus
http://taurus-scada.org
43 stars 46 forks source link

Avoid deadlock between push_event and poll #1061

Closed cmft closed 4 years ago

cmft commented 4 years ago

Tango attribute has a deadlock when it can not connect to a device (e.g. device is down).

Use disablePolling instead of _deactivatePolling to update internal variables.

Move disablePolling out of the critical region in case of PyTango.DevFailed.

cmft commented 4 years ago

The PR solves the crashed of taurus form when the device is down and seems to be also solved the reconnection issues when you restart the device.

taurus --log-level=Debug form sys/tg_test/1/float_scalar

cmft commented 4 years ago

This PR could fix #1060

cpascual commented 4 years ago

I am merging after allowing failures in the py2qt5 tests. This needs to be reverted once #1073 is fixed

reszelaz commented 4 years ago

I just found that there may be still something on this. When running sardanatestsuite I found the testsuite process hung. The threads backtrace points to this issue (see threads 31 and 17):

(gdb) t a a py-bt

Thread 31 (Thread 0x7f654cff9700 (LWP 23498)):
Traceback (most recent call first):
  <built-in method __enter__ of _thread.RLock object at remote 0x7f65684549f0>
  File "/home/zreszela/workspace/taurus/lib/taurus/core/tango/tangoattribute.py", line 511, in poll
    with self.__read_lock:
  File "/home/zreszela/workspace/taurus/lib/taurus/core/tango/tangodevice.py", line 347, in __pollResult
    attr.poll(single=False, value=v, error=err, time=ts)
  File "/home/zreszela/workspace/taurus/lib/taurus/core/tango/tangodevice.py", line 367, in __pollReply
    self.__pollResult(attrs, ts, result)
  File "/home/zreszela/workspace/taurus/lib/taurus/core/tango/tangodevice.py", line 372, in poll
    return self.__pollReply(attrs, req_id)
  File "/home/zreszela/workspace/taurus/lib/taurus/core/tauruspollingtimer.py", line 156, in _pollAttributes
    dev.poll(attrs, req_id=req_id)
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/timer.py", line 93, in __run
    self.__function(*self.__args, **self.__kwargs)
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 30 (Thread 0x7f654d7fa700 (LWP 22126)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f6568fbecb0>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 29 (Thread 0x7f654dffb700 (LWP 22125)):
Traceback (most recent call first):
---Type <return> to continue, or q <return> to quit---
  <built-in method acquire of _thread.lock object at remote 0x7f6568fbef80>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 28 (Thread 0x7f654e7fc700 (LWP 22124)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65680fa440>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 27 (Thread 0x7f654effd700 (LWP 22123)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f6568fbe4b8>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()
---Type <return> to continue, or q <return> to quit---

Thread 26 (Thread 0x7f654f7fe700 (LWP 22122)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65683d0c60>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 25 (Thread 0x7f654ffff700 (LWP 22121)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f6568fbe698>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 24 (Thread 0x7f6560c69700 (LWP 22120)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65ac3cf198>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
---Type <return> to continue, or q <return> to quit---
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 23 (Thread 0x7f656146a700 (LWP 22119)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65680fa300>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 22 (Thread 0x7f6561c6b700 (LWP 22118)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f6568fbe580>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 21 (Thread 0x7f656246c700 (LWP 22117)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65683d0698>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
---Type <return> to continue, or q <return> to quit---
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 20 (Thread 0x7f658ffff700 (LWP 19878)):
Unable to locate python frame

Thread 19 (Thread 0x7f6587fff700 (LWP 19877)):
Unable to locate python frame

Thread 18 (Thread 0x7f65a4ff9700 (LWP 19876)):
Unable to locate python frame

Thread 17 (Thread 0x7f65a57fa700 (LWP 19875)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65680fa3c8>
  File "/usr/lib/python3.5/threading.py", line 1070, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
  File "/usr/lib/python3.5/threading.py", line 1054, in join
    self._wait_for_tstate_lock()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/timer.py", line 86, in stop
    self.__thread.join()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/tauruspollingtimer.py", line 63, in stop

    self.timer.stop(sync=sync)
  File "/home/zreszela/workspace/taurus/lib/taurus/core/tauruspollingtimer.py", line 131, in removeAttribute
    self.stop(sync=True)
  File "/home/zreszela/workspace/taurus/lib/taurus/core/taurusfactory.py", line 361, in removeAttributeFromPolling
    timer.removeAttribute(attribute)
  File "/home/zreszela/workspace/taurus/lib/taurus/core/taurusattribute.py", line 232, in _deactivatePolling
    self.factory().removeAttributeFromPolling(self)
  File "/home/zreszela/workspace/taurus/lib/taurus/core/taurusattribute.py", line 210, in disablePolling
    self._deactivatePolling()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/tango/tangoattribute.py", line 935, in _pushAttrEvent
    self.disablePolling()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/tango/tangoattribute.py", line 881, in push_event
    etype, evalue = self._pushAttrEvent(event)
---Type <return> to continue, or q <return> to quit---
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/event.py", line 141, in __call__
    return func(obj, *args, **kwargs)
  File "/home/zreszela/workspace/pytango/tango/green.py", line 92, in submit
    return fn(*args, **kwargs)
  File "/home/zreszela/workspace/pytango/tango/green.py", line 210, in greener
    return executor.submit(fn, *args, **kwargs)

Thread 16 (Thread 0x7f65a5ffb700 (LWP 19874)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65ad047238>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 15 (Thread 0x7f65a67fc700 (LWP 19873)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65ad02fbe8>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 14 (Thread 0x7f65a6ffd700 (LWP 19872)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65ad02fb20>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
---Type <return> to continue, or q <return> to quit---
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 13 (Thread 0x7f65a77fe700 (LWP 19871)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65ad02fa30>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 12 (Thread 0x7f65a7fff700 (LWP 19870)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65ad0190f8>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 11 (Thread 0x7f65acfd8700 (LWP 19869)):
---Type <return> to continue, or q <return> to quit---
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65ad070b20>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/usr/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/threadpool.py", line 139, in run
    cmd, args, kw, callback, th_id, stack = get()
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 882, in _bootstrap
    self._bootstrap_inner()

Thread 10 (Thread 0x7f65ad8d9700 (LWP 19866)):
Traceback (most recent call first):
  Waiting for the GIL

Thread 9 (Thread 0x7f65b5284700 (LWP 19864)):
Unable to locate python frame

Thread 8 (Thread 0x7f65b5a85700 (LWP 19863)):
Unable to locate python frame

Thread 7 (Thread 0x7f65b6286700 (LWP 19862)):
Unable to locate python frame

Thread 6 (Thread 0x7f65cadd0700 (LWP 19861)):
Unable to locate python frame

Thread 5 (Thread 0x7f65c5e0a700 (LWP 19860)):
Unable to locate python frame

Thread 4 (Thread 0x7f65c3609700 (LWP 19859)):
Unable to locate python frame

Thread 3 (Thread 0x7f65c2e08700 (LWP 19858)):
Unable to locate python frame

Thread 2 (Thread 0x7f65c2607700 (LWP 19857)):
---Type <return> to continue, or q <return> to quit---
Unable to locate python frame

Thread 1 (Thread 0x7f65d0f65700 (LWP 19846)):
Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f65680fac60>
  File "/usr/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
  File "/home/zreszela/workspace/taurus/lib/taurus/core/util/event.py", line 678, in waitEvent
    self._cond.wait(timeout)
  File "/home/zreszela/workspace/sardana/src/sardana/taurus/core/tango/sardana/pool.py", line 463, in start
    evt_wait.waitEvent(DevState.MOVING, after=ts1)
  File "/home/zreszela/workspace/sardana/src/sardana/taurus/core/tango/sardana/pool.py", line 256, in new_fn
    return fn(*args, **kwargs)
  File "/home/zreszela/workspace/sardana/src/sardana/taurus/core/tango/sardana/pool.py", line 501, in go
    eid = self.start(*args, **kwargs)
  File "/home/zreszela/workspace/sardana/src/sardana/taurus/core/tango/sardana/pool.py", line 256, in new_fn
    return fn(*args, **kwargs)
  File "/home/zreszela/workspace/sardana/src/sardana/taurus/core/tango/sardana/pool.py", line 2071, in count_raw
    PoolElement.go(self)
  File "/home/zreszela/workspace/sardana/src/sardana/taurus/core/tango/sardana/pool.py", line 2107, in go
    return self.count_raw(start_time)
  File "/home/zreszela/workspace/sardana/src/sardana/taurus/core/tango/sardana/test/test_pool.py", line 86, in count
    _, values = mg.count(.1)
  File "/home/zreszela/workspace/taurus/lib/taurus/test/base.py", line 113, in newTest
    return helper(**helper_kwargs)
  File "/usr/lib/python3.5/unittest/case.py", line 601, in run
    testMethod()
  File "/usr/lib/python3.5/unittest/case.py", line 649, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python3.5/unittest/suite.py", line 122, in run
    test(result)
  File "/usr/lib/python3.5/unittest/suite.py", line 84, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python3.5/unittest/runner.py", line 176, in run
    test(result)
  File "/home/zreszela/workspace/sardana/src/sardana/test/testsuite.py", line 87, in run
    result = runner.run(suite)
  File "/home/zreszela/workspace/sardana/src/sardana/test/testsuite.py", line 114, in main
    ret = run(exclude_pattern=args.exclude_pattern)
---Type <return> to continue, or q <return> to quit---
  File "/home/zreszela/.local/bin/sardanatestsuite", line 11, in <module>
    load_entry_point('sardana', 'console_scripts', 'sardanatestsuite')()