ljean / modbus-tk

Create Modbus app easily with Python
Other
566 stars 212 forks source link

Deadlock with `modbus.Slave._data_lock` and `hooks._LOCK` #140

Open roryyorke opened 3 years ago

roryyorke commented 3 years ago

We hit this as follows:

For these two cases, the order of lock acquisition is reversed: in Thread-1, hooks._LOCK is acquired before the before_send hook is called, and get_values acquires the Slave object _data_lock. Meanwhile, in MainThread, set_values first acquires the Slave _data_lock, and then calls hook modbus.ModbusBlock.setitem, so needing to acquire hooks._LOCK.

It can happen that Thread-1 acquires hooks._LOCK, then MainThread acquires _data_lock, and that is deadlock.

ljean commented 3 years ago

Thanks for raising this problem? You seems to have a potential fix for it, is this possible that you share your code? I will not have time to care of this in a short term

roryyorke commented 3 years ago

Since it removes capability, what we have is a workaround, not a fix, I have pushed a contrived example and the workaround to https://github.com/roryyorke/modbus-tk/tree/rory/remove-locks

The below I tested on an Ubuntu 20.04 system using Python 3.8:

>>> import sys; print(sys.version); print(sys.implementation)
3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0]
namespace(_multiarch='x86_64-linux-gnu', cache_tag='cpython-38', hexversion=50857456, name='cpython', version=sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0))

On revision cea9ee7 (pre-fix), when I first run deadlock_server.py and then deadlock_client.py, the server shows (snipping lots of initial output):

main 2
('127.0.0.1', 57844) is connected with socket 4...
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
before_send hook 1
main 1
before_send hook 2
before_send hook 3

and outputs nothing further.

The client times out:

2021-01-24 10:01:58,435 INFO    deadlock_client.main    MainThread  connected
Traceback (most recent call last):
  File "deadlock_client.py", line 28, in <module>
    main()
  File "deadlock_client.py", line 24, in main
    logger.info(master.execute(1, cst.READ_HOLDING_REGISTERS, 0, 3))
  File "/home/rory/src/modbus-tk/modbus_tk/utils.py", line 39, in new
    raise excpt
  File "/home/rory/src/modbus-tk/modbus_tk/utils.py", line 37, in new
    ret = fcn(*args, **kwargs)
  File "/home/rory/src/modbus-tk/modbus_tk/modbus.py", line 298, in execute
    response = self._recv(expected_length)
  File "/home/rory/src/modbus-tk/modbus_tk/modbus_tcp.py", line 216, in _recv
    rcv_byte = self._sock.recv(1)
socket.timeout: timed out

The server is stuck. From the output, you can see it got to the .set_values call in main, and the .get_values call in the hook.

If you remove the hook installation, or the .get_values call, the server and client continue. I didn't try removing the .set_values call, I presume that would work too.

In revision dbef15e, the server and client continue, since I've removed the modbus.ModbusBlock.setitem hook altogether.