juggie / AlarmServer

Envisalink 2DS/3/4 Alarm Server (Proxy & HTTP/JSON)
72 stars 67 forks source link

RuntimeError: 'maximum recursion depth exceeded in logger after Envisalink firmware upgrade #78

Open rct opened 1 year ago

rct commented 1 year ago

EDIT/UPDATE:: A bug in Envisalink EVL4 (DSC) firmware version 01.0.203 (2023-01-27) uncovered a bug AlarmServer (or one of the supporting libraries), I've been running the old master branch from 2017) on Python 2.7, tornado 4.4.1).

The bug in the EVL4 firmware caused the EVL4 to continually send 67302, 67303, 67304 (Partition 2,3,4 busy). The traffic was about 30 messages from the Envisalink every second.

What happened was:

  1. AlarmServer would connect and login,
  2. The EVL4 started sending 67302, 67303, 67304 over and over.
  3. AlarmServer would send the 001 command to get status updates.
  4. EVL4 sends the normal status update responses mixed in with 67302, 67303, 67304.
  5. AlarmServer logged this output, which caused some sort of recursion to occur. When the recursion depth hit the Python default recursion limit of 1000, an exception was thrown Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method _TracebackLogger.__del__ of <tornado.concurrent. _TracebackLogger object at ...

Note: simply raising Python's recursion limit to 10,000 via sys.setrecursionlimit(10000) did NOT work around the problem. My recollection is that AlarmServer waits a second for the envisalink to stop sending updates to know that the initial state dump is complete. With the envisalink constantly sending messages, AlarmServer never sent anything to the envisalink after the initial 001 status dump.

There is a separate old/(known?) bug that AlarmServer isn't able to recover from an exception thrown while processing data from the envisalink. The AlarmServer process continues to run and other (I'm not sure of the correct terminology here: thread, coroutine, callback) continue to run. Things like ping will still get sent to the envisalink, but no responses will ever be processed.

So the problems/questions are:

  1. (This issue) why did these messages from the envisalink cause recursion to occur and how to fix that ?
  2. (old issue) Any easy way to make AlarmServer able to recover from an exception when receiving, or get AlarmServer to exit and restart if the receive (thread/coroutine/process) gets broken

I'm still investigating but wanted to post this in case someone else sees this problem or has a clue what's going on.

Here's the log:


...
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@158: RX < 610 - Zone 064 Restored
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed zone 64
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:31 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6734D4"
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 4 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 4
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6734D4"
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 4 is Busy
Jan 27 14:42:32 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 4
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6734D4"
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 4 is Busy
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 4
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:33 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6734D4"
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 4 is Busy
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 4
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6732D2"
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 2 is Busy
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 2
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@131: RX RAW < "6733D3"
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_line@158: RX < 673 - Partition 3 is Busy
Jan 27 14:42:34 - DEBUG - core/envisalink.py:handle_event@240: Ignoring unnamed partition 3
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/logging/__init__.py", line 861, in emit
Exception RuntimeError: 'maximum recursion depth exceeded while calling a Python object' in <bound method _TracebackLogger.__del__ of <tornado.concurrent.
_TracebackLogger object at 0xb74d672c>> ignored
rct commented 1 year ago

I've updated/rewritten this issue now that I have a better understanding of what the problem is.

I'm going to try one of the other branches/forks of AlarmServer on Python 3 with a more up-to-date version of Tornado and the other requirements.

Does anyone have a recommended updated branch/fork for AlarmServer?

It looks like @tinuva's fork/master branch might be most up-to-date https://github.com/tinuva/AlarmServer. It's had recent activity in 2021 and seems to have merged @juggie's python3 branch along with the smartthings PRs that occurred in the interim, plus has additional work on a requirements file, Docker support, and some other fixes.

juggie commented 1 year ago

Hey @rct I have to say I'm shocked this is still in use many (many years) after I stopped working on it. But your use case of using it as a proxy and point home assistant at that proxy makes sense!

I brought the python3 branch to life, and also updated the code to support the latest version of Tornado. The stack trace above isn't that helpful, it is happening in the python logger and tornado. Also, as I have a EVL3 I am unable to reproduce (I did have my own problem connecting to my EVL3 from HA but that turned out to be something else.

If I can find some time I'd like to update the code to use pyenvisalink. That would also require me to update from using Tornado to AsyncIO. This code base appears to have been the inspiration for pyenvisalink but I am not sure how much is changed.

I'll have a look at the fork you mentioned as well.

tinuva commented 1 year ago

I may have made some changes on my fork https://github.com/tinuva/AlarmServer But it is definitely not the best fork out there unfortunately.

pyenvisalink as a new project would be better.

I had some plans for my fork but ultimately I don't have time nowadays to work on it. Essentially I wanted to have mqtt support to have the alarmserver expose the panel to home assistant instead of using the home assistant integration which was also not really getting updated. And to add bypass switch support (this is available on the alarmserver web ui in my fork).

@juggie if you want someone to test with EVL4 and DSC 1864 I can definitely help out with that.

my version seem to be very different from @rct

Firmware Version: 01.03.155 - MAC: 001C2A021F06 so 1.3.155

I think @rct meant he is on version 1.5.203 See: https://forum.eyezon.com/viewtopic.php?t=5115

juggie commented 1 year ago

I started on a bit of code last night to use pyenvisalink with AlarmServer. I have that started but pyenvisalink doesn't expose the raw events which would be needed for the proxy functionality. As a result I've started working on a bit of a code cleanup for pyenvisalink as well. The first thing I noticed is that there's likely a bug in its parser leading to partial data and potentially missing events in some scenarios due to how it reads off the TCP socket. It just reads a bunch of data in one shot and assumes it's getting all the events in their entirety. I think the code should be able to be cleaned up a decent amount but it depends on having time!

rct commented 1 year ago

@juggie - maybe this is a good time to enable GitHub discussions on this repo?


Re: My use case for AlarmServer in 2023

@juggie wrote:

I have to say I'm shocked this is still in use many (many years) after I stopped working on it. But your use case of using it as a proxy and point home assistant at that proxy makes sense!

The Envisalink + AlarmServer set to log all the protocol level details from the Envisalink has been very useful for me to monitor, debug, and diagnose stuff especially when I'm remote for long periods. The shutdown of the AT&T 3G cellular network in the US has removed the 3g radio from my DSC system along with the connect to Connect24 and the DSC Surgard based central station, so the sources of unknown stuff have gone away.


Re: firmware versions - @tinuva wrote:

my version seem to be very different from @rct, Firmware Version: 01.03.155

When I got caught by the AT&T 3g shutdown, I replaced the 3g cellular radio (aka communicator) with the EyezOn Sidekick LTE radio gateway last year. EyezOn has been pushing the new 01.04 firmware (and now 01.0) to Envisalink EVL4's that are connected to the Sidekick LTE gateways.

So firmware wise we have:


Re: other forks - @tinuva wrote:

I may have made some changes on my fork https://github.com/tinuva/AlarmServer But it is definitely not the best fork out there unfortunately.

@tinuva - Since you didn't mention it, I'm guessing that while you don't feel your fork is ready for primetime, you haven't found a better fork?

I see your fork is using Tornado 6.1, was that experimental or are you running that everyday?


Re: Tornado vs Python Asyncio

I see from the release notes that since Tornado 5.x in 2018, tornado has moved to better integration with asyncio:

From https://www.tornadoweb.org/en/stable/releases/v5.0.0.html

The focus of this release is improving integration with asyncio. On Python 3, the IOLoop is always a wrapper around the asyncio event loop, and asyncio.Future and asyncio.Task are used instead of their Tornado counterparts. This means that libraries based on asyncio can be mixed relatively seamlessly with those using Tornado. While care has been taken to minimize the disruption from this change, code changes may be required for compatibility with Tornado 5.0, as detailed in the following section.

@juggie wrote:

I brought the python3 branch to life, and also updated the code to support the latest version of Tornado.

That's great. Let me know when I should give that branch a try (at a minimum connected to my existing AlarmServer proxy).


Re: state of Pyenvisalink/MQTT gateway

@tinuva wrote:

Essentially I wanted to have mqtt support to have the alarmserver expose the panel to home assistant instead of using the home assistant integration which was also not really getting updated.

I agree with all of that. Currently pyenvisalink mostly works as long as nothing goes wrong, but I can't depend on it and have the backups of the EyezOn notifications/mobile app/web portal as well as AlarmServer.

Ideally we'd have an MQTT gateway to the Envisalink. The MQTT stuff is often able to evolve much more quickly being somewhat decoupled from Home Assistant releases. I've been using ZwaveJS2MQTT (Now called ZwaveJS UI) and have been both very happy and impressed. I've been doing some Zigbee testing in 2023 and Zigbee2MQTT seems to me to be the better choice than the built in ZHA.

Tangential Note: I've been using an RTL-SDR -> rtl_433 -> MQTT to also expose my RF devices, including all my wireless DSC sensors, weather, and environmental sensors to Home Assistant.


Re: Adding Bypass to AlarmServer

@tinuva wrote:

And to add bypass switch support (this is available on the alarmserver web ui in my fork).

This would be nice to have, can we get that. It would be nice to get that merged into a new Python 3/Tornado 6.x version.

juggie commented 1 year ago

@rct I enabled discussions feel free to start one

tinuva commented 1 year ago

@tinuva - Since you didn't mention it, I'm guessing that while you don't feel your fork is ready for primetime, you haven't found a better fork? That is correct, I did look but haven't found anything.

I see your fork is using Tornado 6.1, was that experimental or are you running that everyday? I am running my fork every day yes. I honestly, can't remember why, but that change happened because of an issue that I can't remember what it was.

Ideally we'd have an MQTT gateway to the Envisalink. The MQTT stuff is often able to evolve much more quickly being somewhat decoupled from Home Assistant releases. I've been using ZwaveJS2MQTT (Now called ZwaveJS UI) and have been both very happy and impressed. I've been doing some Zigbee testing in 2023 and Zigbee2MQTT seems to me to be the better choice than the built in ZHA. Exactly. I use zigbee2mqtt myself.

The goal was to model alarmserver to do something similar to what PAI (Paradox Alarm Interface) is doing for Paradox alarms.

Re: Adding Bypass to AlarmServer

@tinuva wrote:

And to add bypass switch support (this is available on the alarmserver web ui in my fork).

This would be nice to have, can we get that. It would be nice to get that merged into a new Python 3/Tornado 6.x version.

I sure this is absolutely possible. This is what it looks like:

Screenshot 2023-01-29 at 17 56 40

The B1 and B2 buttons etc is the bypass buttons I added in and the log reflects it.

rct commented 1 year ago

@rct I enabled discussions feel free to start one

See #80 (Current state of AlarmServer) - for an encapsulation of some of what we've touched on in this issue and other threads.