Open roryyorke opened 3 years ago
Thanks for raising this problem? You seems to have a potential fix for it, is this possible that you share your code? I will not have time to care of this in a short term
Since it removes capability, what we have is a workaround, not a fix, I have pushed a contrived example and the workaround to https://github.com/roryyorke/modbus-tk/tree/rory/remove-locks
The below I tested on an Ubuntu 20.04 system using Python 3.8:
>>> import sys; print(sys.version); print(sys.implementation)
3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]
namespace(_multiarch='x86_64-linux-gnu', cache_tag='cpython-38', hexversion=50857456, name='cpython', version=sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0))
On revision cea9ee7 (pre-fix), when I first run deadlock_server.py
and then deadlock_client.py
, the server shows (snipping lots of initial output):
main 2
('127.0.0.1', 57844) is connected with socket 4...
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
before_send hook 1
main 1
before_send hook 2
before_send hook 3
and outputs nothing further.
The client times out:
2021-01-24 10:01:58,435 INFO deadlock_client.main MainThread connected
Traceback (most recent call last):
File "deadlock_client.py", line 28, in <module>
main()
File "deadlock_client.py", line 24, in main
logger.info(master.execute(1, cst.READ_HOLDING_REGISTERS, 0, 3))
File "/home/rory/src/modbus-tk/modbus_tk/utils.py", line 39, in new
raise excpt
File "/home/rory/src/modbus-tk/modbus_tk/utils.py", line 37, in new
ret = fcn(*args, **kwargs)
File "/home/rory/src/modbus-tk/modbus_tk/modbus.py", line 298, in execute
response = self._recv(expected_length)
File "/home/rory/src/modbus-tk/modbus_tk/modbus_tcp.py", line 216, in _recv
rcv_byte = self._sock.recv(1)
socket.timeout: timed out
The server is stuck. From the output, you can see it got to the .set_values
call in main, and the .get_values
call in the hook.
If you remove the hook installation, or the .get_values
call, the server and client continue. I didn't try removing the .set_values
call, I presume that would work too.
In revision dbef15e, the server and client continue, since I've removed the modbus.ModbusBlock.setitem
hook altogether.
We hit this as follows:
modbus_tcp.TcpServer.before_send
hook, we callmodbus.Slave.get_values
; this runs in the TCP server thread (Thread-1
)modbus.Slave.set_values
(this isMainThread
)For these two cases, the order of lock acquisition is reversed: in
Thread-1
,hooks._LOCK
is acquired before thebefore_send
hook is called, andget_values
acquires theSlave
object_data_lock
. Meanwhile, inMainThread
,set_values
first acquires theSlave
_data_lock
, and then calls hookmodbus.ModbusBlock.setitem
, so needing to acquirehooks._LOCK
.It can happen that
Thread-1
acquireshooks._LOCK
, thenMainThread
acquires_data_lock
, and that is deadlock.