smarthomeNG / smarthome

Device integration platform for your smart home
https://www.smarthomeNG.de
GNU General Public License v3.0
121 stars 92 forks source link

lib.network: KNX plugin: Thread can only be started once #494

Closed onkelandy closed 9 months ago

onkelandy commented 2 years ago

Not totally sure what leads to that error.. maybe we can at least catch the error in a nice error message?

Jun 07 00:15:51 ERROR    KNX poll jalousien.og.westen_fluegel.hoehe Method KNX poll jalousien.og.westen_fluegel.hoehe exception: threads can only be started once
Traceback (most recent call last):
  File "/usr/local/smarthome/lib/scheduler.py", line 659, in _task
    obj(**value)
  File "/usr/local/smarthome/plugins/knx/__init__.py", line 252, in _poll
    self.groupread(kwargs['ga'])
  File "/usr/local/smarthome/plugins/knx/__init__.py", line 243, in groupread
    self._send(pkt)
  File "/usr/local/smarthome/plugins/knx/__init__.py", line 191, in _send
    self._client.send(send)
  File "/usr/local/smarthome/lib/network.py", line 680, in send
    self.connect()
  File "/usr/local/smarthome/lib/network.py", line 650, in connect
    self.__connect_thread.start()
  File "/usr/lib/python3.9/threading.py", line 869, in start
    raise RuntimeError("threads can only be started once")
RuntimeError: threads can only be started once
bmxp commented 1 year ago

Did you get this error again?

onkelandy commented 1 year ago

Haven't tried with latest version but yes this error came up multiple times. I think esp. when you restart shng several times within a shorter time period, e.g. within 20-30 seconds or so. Not sure.

msinn commented 1 year ago

Is this issue still relevant?

Morg42 commented 1 year ago

Catching the error is trivial (enclosing the thread.start-call in try..except), but I'd like to find out the reason for multiple starts. This might be a timing problem on short interval disconnects and sends or restarts (via adminUI or stopping and starting on the console?)

Also see gitter ;)

Morg42 commented 1 year ago

Found 2 occurrences on 2022-02-15, but not anymore since. Always current master version

onkelandy commented 1 year ago

Currently can't reproduce the issue with latest develop.. not sure if it's fixed..?

msinn commented 1 year ago

I am going to close the issue as resolved. If it should happen again, the issue can be reopened.

onkelandy commented 1 year ago

It just happened again.. Here is the network log, maybe helpful..?

Mär 29 22:28:25 CEST DEBUG    lib.network       KNX poll jalousien.eg.osten_buero.hoehe (plugins.knx_127.0.0.1:6720) autoconnecting on send attempt, message is bytearray(b"\x00\x06\x00\'\x11\x12\x00\x00")  --  (network.py:send:695)
Mär 29 22:28:25 CEST DEBUG    lib.network       KNX poll jalousien.og.sueden_schiebetuere.hoehe (plugins.knx_127.0.0.1:6720) autoconnecting on send attempt, message is bytearray(b"\x00\x06\x00\'\x19&\x00\x00")  --  (network.py:send:695)
Mär 29 22:28:25 CEST DEBUG    lib.network       KNX poll jalousien.eg.osten_buero.hoehe Starting connect to 127.0.0.1:6720  --  (network.py:connect:658)
Mär 29 22:28:25 CEST DEBUG    lib.network       KNX poll jalousien.og.sueden_schiebetuere.hoehe Starting connect to 127.0.0.1:6720  --  (network.py:connect:658)
Mär 29 22:28:25 CEST DEBUG    lib.network       KNX poll jalousien.eg.osten_buero.hoehe connect() to 127.0.0.1:6720: self.__running=False, self.__connect_thread.is_alive()=False  --  (network.py:connect:662)
Mär 29 22:28:25 CEST DEBUG    lib.network       KNX poll jalousien.og.sueden_schiebetuere.hoehe connect() to 127.0.0.1:6720: self.__running=False, self.__connect_thread.is_alive()=False  --  (network.py:connect:662)
Mär 29 22:28:25 CEST DEBUG    lib.network       KNX poll jalousien.eg.osten_buero.hoehe connect() to 127.0.0.1:6720: calling __connect_thread.start()  --  (network.py:connect:664)
Mär 29 22:28:25 CEST DEBUG    lib.network       KNX poll jalousien.og.sueden_schiebetuere.hoehe connect() to 127.0.0.1:6720: calling __connect_thread.start()  --  (network.py:connect:664)
Mär 29 22:28:25 CEST DEBUG    lib.network       TCP_Connect (plugins.knx_127.0.0.1:6720) (plugins.knx_127.0.0.1:6720) starting connection cycle  --  (network.py:_connect_thread_worker:743)
Mär 29 22:28:25 CEST DEBUG    lib.network       KNX poll jalousien.eg.osten_buero.hoehe leaving connect() to 127.0.0.1:6720  --  (network.py:connect:666)
Mär 29 22:28:25 CEST DEBUG    lib.network       TCP_Connect (plugins.knx_127.0.0.1:6720) (plugins.knx_127.0.0.1:6720) connecting using TCP/IPv4 with autoreconnect  --  (network.py:_connect:783)
Mär 29 22:28:25 CEST INFO     lib.network       TCP_Connect (plugins.knx_127.0.0.1:6720) (plugins.knx_127.0.0.1:6720) connected  --  (network.py:_connect:792)

Error itself:

Mär 29 22:28:25 ERROR    KNX poll jalousien.og.sueden_schiebetuere.hoehe Method KNX poll jalousien.og.sueden_schiebetuere.hoehe exception: threads can only be started once
Traceback (most recent call last):
  File "/usr/local/smarthome/lib/scheduler.py", line 692, in _task
    obj(**value)
  File "/usr/local/smarthome/plugins/knx/__init__.py", line 228, in _poll
    self.groupread(kwargs['ga'])
  File "/usr/local/smarthome/plugins/knx/__init__.py", line 219, in groupread
    self._send(pkt)
  File "/usr/local/smarthome/plugins/knx/__init__.py", line 167, in _send
    self._client.send(send)
  File "/usr/local/smarthome/lib/network.py", line 696, in send
    self.connect()
  File "/usr/local/smarthome/lib/network.py", line 665, in connect
    self.__connect_thread.start()
  File "/usr/lib/python3.9/threading.py", line 869, in start
    raise RuntimeError("threads can only be started once")
RuntimeError: threads can only be started once

KNX Busmonitor shows that no commands were sent or received at that time

Mär 29 22:26:12  - Nutze Feiertage für Land 'AT', Provinz '9', benutzerdefinierte(r) Feiertag(e) nicht definiert;
Mär 29 22:28:33 ;1.1.32;0/6/4;False;
Morg42 commented 1 year ago

Its plain to see that two KNX poll requests are submitted at "the same time" (at least inside the same second as the whole "incident") - both requests initiate autoreconnect, trigger connect, trigger connection_cycle_start.

Maybe switching from a locking flag to using "real" locks might solve this, maybe just try/excepting the thread start call might achieve similar results.

I'll take another look into lib.network.

(I'm wondering why the connection is dropped in the first place...?)

Morg42 commented 1 year ago

Proposed fix https://github.com/smarthomeNG/smarthome/pull/543

Morg42 commented 1 year ago

As long as no further negative feedback is given, I'll close this as resolved.

onkelandy commented 1 year ago

Hm, didn't we have to revert the fix? Don't think this issue is really fixed.. Let me know, we close it again if I'm wrong ;)

Morg42 commented 1 year ago

those who (still) have the issue, please test if the fix works for you

onkelandy commented 10 months ago

with the latest commit I think this issue is gone..let's test some more days/weeks and then close. Thx