node-alarm-dot-com / homebridge-node-alarm-dot-com

Alarm.com plugin for Homebridge using Node.js
MIT License
59 stars 23 forks source link

[MEGATHREAD] ECONNECT errors and unresponsive plugin #72

Closed chase9 closed 3 years ago

chase9 commented 3 years ago

Overview

I'm creating a megathread for us to pool thoughts and spread information. For the past few months, more and more people have been experiencing errors in their logs to the affect of:

Error: GET https://www.alarm.com/web/api/devices/sensors?idsxxxxx failed, reason: read ECONNRESET
    at /homebridge/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:470:15
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async Promise.all (index 1)
    at async Promise.all (index 0)

This error may or may not be accompanied by your accessories becoming unresponsive.

Why is this happening?

I believe what has happened is that Alarm.com has started ratelimiting calls to their APIs. This is completely fair, and likely to prevent DDOS attacks, but it means that our plugin is having trouble due to how "chatty" it is.

What can I do right now?

My testing has led me to believe that they're limiting peoples' IP addresses once a threshold of requests has been reached. This means you can (likely) get around the restriction by doing two things:

  1. Do this first! Increase the values authTimeoutMinutes and pollTimeoutSeconds. I would try setting authTimeout to 60 and pollTimeout to 300. This will reduce how chatty the plugin is while also decreasing how responsive it is to external changes. Doing this may help prevent you from getting banned again.
  2. Restart your homes internet modem. For most people, this will get you a new external IP address so that you're no longer banned.

How can we fix this long-term?

Assuming this is the actual cause of the problem, the fix would be increasing the defaults for the above values. Since this will reduce how responsive the plugin is, we will need to build dynamic code so local device changes are properly reflected. Unfortunately I'm not sure if there's any way around making the plugin less responsive to external changes.

We should also at some point comb through the code to see if it's feasible to reduce the amount of API calls it makes. This may help prevent people from getting banned.

I'm sorry to the people this has inconvenienced.

ifeign commented 3 years ago

I reinstalled to give your advice a try, but I have been unable to change my public IP, so I'm now experiencing a new issue: only my panel and garage door are being discovered, they both work though.

DMBlakeley commented 3 years ago

I change the values for authTimeoutMinutes and pollTimeoutSeconds to the recommended values and have let run for the last 2 days. Consistently getting the following error every 5 minutes and I am not able to control Alarm.com:

[5/16/2021, 7:42:14 AM] [Security System] Error: GET https://www.alarm.com/web/api/systems/systems/2736084 failed: [object Object]
    at /usr/local/lib/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:470:15
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async Promise.all (index 0)

Also have same issue as @ifeign where I cannot change my public IP as this seems to be locked to my modem's MAC address.

Question, if I have been blocked by Alarm.com and I cannot change my public IP am I basically out of luck using this plugin?

chase9 commented 3 years ago

In the latest beta (1.7.2-beta.4) I added some randomness to the device refresh, so whatever value you put will have between 0-5 minutes added. I don't think any one thing will solve this problem, but eventually we'll have enough variance to not get caught.

Eventually your MAC address will switch, and if your mobile app still works we can assume there's a way around this. The issue is that I don't have enough time or energy to troubleshoot this forever...

ifeign commented 3 years ago

For those of us with Qolsys panels, which is probably quite a few, addressing my issue would fix polling of sensors, meaning potentially less server load - you’d just have to poll when something like a garage door or lock is used, vs constantly checking sensor status https://github.com/node-alarm-dot-com/homebridge-node-alarm-dot-com/issues/71

anthonyb82 commented 3 years ago

Chase, first and foremost, thank you so much for all the work you have done on this plugin. It has been one of my favorites on homebridge but I can’t imagine the headache it’s caused you since this error.

I was wondering, is it possible to strip the plug-in down to just having the ability to set the arm state of the system and not pull in all of the accessories? Having the arm state connected to “good night” scenes, away from home geofencing etc is absolutely awesome. My thought was perhaps it would only need to log in when an arm/disarm command is sent.

knuckleheadsmiff commented 3 years ago

I change the values for authTimeoutMinutes and pollTimeoutSeconds to the recommended values and have let run for the last 2 days. Consistently getting the following error every 5 minutes and I am not able to control Alarm.com:


[5/16/2021, 7:42:14 AM] [Security System] Error: GET https://www.alarm.com/web/api/systems/systems/2736084 failed: [object Object]

    at /usr/local/lib/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:470:15

    at runMicrotasks (<anonymous>)

    at processTicksAndRejections (internal/process/task_queues.js:95:5)

    at async Promise.all (index 0)

I too am now seeing this after changing the settings as suggested.

(I only noticed seeing this in my logs when I recently upgraded to a much newer version of nodejs.)

DMBlakeley commented 3 years ago

I will add to the thanks for this plug-in. It worked flawlessly for quite some time which is pretty amazing!

I have Homebridge running on a Mac Mini. What I find really strange is that I can login to alarm.com using Safari on this Mac, however, the alarm-dot-com plugin running on the same Mac fails at login.

ngori commented 3 years ago

@chase9 Hi Chase, I installed beta 9 this morning. I was able control a door lock 1 time before getting the following:

[5/18/2021, 10:15:57 AM] [Security System] Error: GET https://www.alarm.com/web/api/systems/systems/5349741 failed: [object Object] at /usr/lib/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:464:15 at processTicksAndRejections (internal/process/task_queues.js:97:5) at async Promise.all (index 0)

Plugin and all devices have become unresponsive now. Not sure how but ADC appears to be detecting plugin connections vs app or web interface connections almost immediately.

DMBlakeley commented 3 years ago

Hi Chase, similar behavior as @ngori. Installed beta 9, configured and rebooted. On initial boot, login was successful returning registered panel and devices as well as initial state (motion detected). First sample loop returned the ECONNECT error and devices no longer updated.

I only have alarm system with no additional devices such as lights and garage door. Login through webpage and app occur without issue.

dkolb commented 3 years ago

Hello. I just wanted to drop back by and report on this comment mentioning the Home Assistant plugin.

uvjustin/alarmdotcom and uvjustin/pyalarmdotcomajax also suffer from this issue for me. The difference is, as best I can tell, this integration only supports arming and disarming so it kinda brute forces the operation. That probably explains the lack of complaints on those repositories.

raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host www.alarm.com:443 ssl:default [Connection reset by peer]
2021-05-20 22:08:01 WARNING (MainThread) [homeassistant.helpers.entity] Update of alarm_control_panel.alarm_com is taking over 10 seconds
2021-05-20 22:08:07 ERROR (MainThread) [pyalarmdotcomajax.pyalarmdotcomajax] Can not load state data from Alarm.com
2021-05-20 22:08:31 WARNING (MainThread) [homeassistant.helpers.entity] Update of alarm_control_panel.alarm_com is taking over 10 seconds
2021-05-20 22:08:37 ERROR (MainThread) [pyalarmdotcomajax.pyalarmdotcomajax] Can not load state data from Alarm.com

I think we can pretty conclusively say we are triggering some sort of defensive DDoS protection on their side. Anyway, now uninstall HA from my k8s cluster. :-D

ifeign commented 3 years ago

It’s very unlikely to work, but has anyone tried reaching out to the Alarm.com dev team? This guy, for example https://www.linkedin.com/in/akshaybaviskar

ngori commented 3 years ago

I believe we had a former ADC dev assisting for a while. To your point it's worth a shot though.

Couple other thoughts/brainstorms:

I thought there was a version of an ADC plugin at one point doing a screen scrape. Very inefficient vs the api's but possibly an option that wouldn't get blocked.

The other thought was routing this plug-in traffic through a vpn service and varying the end points. If the denial is just based on ip (seems like this is possibly given there has been limited success with new WAN ips) this might work. If ADC is rate limiting at the account level it wouldn't help. Not sure how feasible it would be to implement though given you would need a commercial VPN service.

jfmach commented 3 years ago

Hello,

Bit of feedback the other way around: I use 1.7.1, I have 30+ sensors/detectors and while I have a lot of the ECONNRESET errors, generally speaking the system is working and reporting very well.

I have authTimeoutMinutes set to 30 and I didn't set pollTimeoutSeconds so it's set to whatever the default is.

I do have an issue with the arming/disarming of the system but I worked around it by creating dummy switches (homebridge-dummy). Instead of arming/disarming from the homebridge-node-alarm-dot-com alarm accessory, my automations interact with the dummy switches. The dummy switches automatically turn off after 30 seconds and I have additional automations that turn the alarm on (or off) when the dummy switches are turned on AND off.

That works well. Often it doesn't change the alarm state when the dummy switch is turned on, but it does work after 30 seconds when it turns off.

So it seems to me that the problem is that something somewhere needs to be refreshed, and it's refreshed by the initial attempt at changing the status of the alarm.

If I can help troubleshoot this let me know.

Great plugin for me, thanks for all the hard work!

ifeign commented 3 years ago

Has any progress been made on this? Just my luck that I moved into a house that came with an Alarm.com system only for this to happen lol

Elder-HVAC-Man commented 3 years ago

@jfmach , how do you control the alarm on and off states with dummy switches? That would solve this problem for me. Thanks.

ifeign commented 3 years ago

So, I decided to give the plugin another try. As of this writing, my alarm system, lock and garage door work. BUT I don't have any door, window or motion sensors. This isn't the biggest deal, but it's a little strange. I cleared my accessory cache and it didn't change anything. @chase9 any suggestions?

In a way, this is a slight blessing - I am working on setting up Home Assistant and the local Qolsys plugin I've mentioned previously. This would give me realtime sensors, allowing me to exclude everything in the Homebridge plugin except my lock and garage door.

EDIT: Nvm, I was using a different login from my primary account and had it set it "limited device access" which I guess blocks contact sensors. Changing it to "full control" revealed my sensors

DMBlakeley commented 3 years ago

I unloaded the plugin a couple of weeks ago. Retried yesterday and initially thought that the issue had cleared on the Alarm.com end. Within 30 minutes the ECONNECT errors returned and the plugin would not respond. Started looking over the code just for understanding. As webpage and iOS app work just fine wondering if there is a problem in the way the ‘node-fetch’ node_module queries the Alarm.com servers.

ifeign commented 3 years ago

I unloaded the plugin a couple of weeks ago. Retried yesterday and initially thought that the issue had cleared on the Alarm.com end. Within 30 minutes the ECONNECT errors returned and the plugin would not respond.

Are you using the beta release? It’s been almost perfect for me. Sure, things are a little slower due to the reduced polling, but it’s been 90% reliable. I haven't seen ECONNECT errors, but have seen other random timeouts that eventually resolved themselves

DMBlakeley commented 3 years ago

Yes, I am using the beta. What do you have your sampling settings at? Default or higher?

ifeign commented 3 years ago

Default, timeout 10, polling 60 On Sun, Jun 13, 2021 at 9:10 AM Doug B @.***> wrote:

Yes, I am using the beta. What do you have your sampling settings at? Default or higher?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/node-alarm-dot-com/homebridge-node-alarm-dot-com/issues/72#issuecomment-860217359, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWLHYEWF7HTNPHC4SQPTQLTSS367ANCNFSM445CTDQQ .

DMBlakeley commented 3 years ago

Thanks. I have homebridge running on an M1 Mac Mini. Generated a new Safari mfaCookie for the Mini and reloaded beta-9 of the plug-in with default values. Will see if errors are generated and if so impact on functionality.

ifeign commented 3 years ago

Thanks. I have homebridge running on an M1 Mac Mini. Generated a new Safari mfaCookie for the Mini and reloaded beta-9 of the plug-in with default values. Will see if errors are generated and if so impact on functionality.

IDK if it helps, but I made a secondary alarm.com account specifically to use with this plugin and to reduce server activity from my primary account - added bonus is you can revoke that account should you suspect any security breach

DMBlakeley commented 3 years ago

I thought of giving that a try. With loging level at 4, I am getting the following error:

[6/13/2021, 4:35:18 PM] [Security System] Error: GET https://www.alarm.com/web/api/systems/systems/2736084 failed: [object Object] at /usr/local/lib/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:464:15 at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:95:5) at async Promise.all (index 0)

problem or not?

ifeign commented 3 years ago

That looks like an error I’ve been getting, that seems to make the plugin unresponsive briefly. It seems to resolve itself though, usually a child bridge restart (or homebridge restart if you haven’t set up child bridges) fixes it if I’m not patient enough.

Side note, make sure your secondary account has full control, at least in my case, contact sensors wouldn’t show up until I did so, even though everything else did

DMBlakeley commented 3 years ago

Set up a secondary account just for homebridge with full control. All sensors are visible.

Do you recommend running plugin in a child bridge?

ifeign commented 3 years ago

Set up a secondary account just for homebridge with full control. All sensors are visible.

Do you recommend running plugin in a child bridge?

I like the convenience of child bridges, I can restart one platform without restarting the whole server. You’ll have to re-add all your related accessories if you do move to a child bridge, they also use more RAM, the more bridges you have. Thankfully I’ve got an 8gb Pi 4 because I’ve isolated almost everything lol, but I have had a multitude of child bridges running fine on a Pi 3

DMBlakeley commented 3 years ago

Using the secondary account I am still occassionally getting the same error but interaction with Alarm.com seems to be working. Will run with this configuration for now. Would like to try out child bridges but leave for another day.

nflute commented 3 years ago

I have no knowledge of coding but hope this will help or give some clue. I reinstalled v. 1.7.2-beta. 4 (the first beta that supposed 2FA) and I have no error message so far.

DMBlakeley commented 3 years ago

This is a great observation. I loaded v1.7.2-beta.4 and did not observe error messages. Looking more closely at the overnight log I find a couple occurrences of my previously reported error but no ECONNECT errors. Log is showing that security system is polling every 10 minutes. Much different behavior than beta.9.

scottleestrange commented 3 years ago

This still happens

ifeign commented 3 years ago

On which release? I’m kinda wondering if we’ll see some of the people who rolled back to the earlier beta will get flagged by the server and go back to econnect errors

DMBlakeley commented 3 years ago

I am trying to see if I can start with beta-4 and add in the code changes stepwise to see where the problem occurs.

The first item I found was that the randomized timer was I believe 10x higher than planned resulting in a very long polling interval. Have submitted comment to this code change.

DMBlakeley commented 3 years ago

I believe I found the issue with beta-9 but want to do more testing to make sure.

UPDATE - I have submitted a pull request with the polling randomizing factor correction. With change beta.9 has been running well. A couple of errors in 24 hours but none that impacter plug-in operation. As the default values are being used the plug-in is also reasonably responsive. The beta.6 and higher also handles caching and restore of sensors in a much better manner than was present in beta.4 which results in much cleaner startup of homebridge.

scottleestrange commented 3 years ago

Secondary accounts are not the answer. Some of us cannot create one. Alarm companies that work with alarm.comhttp://alarm.com have that option and restrict it. I for now cannot create a secondary account. I was trying to help test

On Jun 14, 2021, at 9:05 AM, Doug B @.**@.>> wrote:

Using the secondary account I am still occassionally getting the same error but interaction with Alarm.comhttp://Alarm.com seems to be working. Will run with this configuration for now. Would like to try out child bridges but leave for another day.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnode-alarm-dot-com%2Fhomebridge-node-alarm-dot-com%2Fissues%2F72%23issuecomment-860804517&data=04%7C01%7C%7Cb1311bf652ad4c029bbd08d92f4e3346%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637592835176698046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Y6CvZ6mqi1WCg0saqhkxIsUZ%2F7GZhGY7SqMTOti0LJg%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAT2J4BX6KBLK7G7HZNXGLHLTSYSDXANCNFSM445CTDQQ&data=04%7C01%7C%7Cb1311bf652ad4c029bbd08d92f4e3346%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637592835176708042%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Qml2KDDnavelzxqAUVypnfxcHH4z%2FXGFapOoY8fYhPs%3D&reserved=0.

ifeign commented 3 years ago

Secondary accounts were more of a theory anyway. My main reason for using it is if my homebridge passwords ever leak, I dont leak the info from my primary account.

Have you asked your provider to lift the limitation? Seems kinda odd they’d restrict that in the first place

scottleestrange commented 3 years ago

Well I realized that. I gathered that from the e-mail chain/. Just pointing out my observations for your theory

On Jun 16, 2021, at 6:42 AM, ifeign @.**@.>> wrote:

Secondary accounts were more of a theory anyway. My main reason for using it is if my homebridge passwords ever leak, I dont leak the info from my primary account

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnode-alarm-dot-com%2Fhomebridge-node-alarm-dot-com%2Fissues%2F72%23issuecomment-862390834&data=04%7C01%7C%7C2be9721a2cbd4e0906b208d930cc85ea%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637594477235345935%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Al86MZitr4fjZ1Dw8eToVDf7Gemu5T3fuBjKKHUawRQ%3D&reserved=0, or unsubscribehttps://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAT2J4BWNQZUNJRHZF5DXDETTTCS2VANCNFSM445CTDQQ&data=04%7C01%7C%7C2be9721a2cbd4e0906b208d930cc85ea%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637594477235355931%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=EHEHdClgm%2BBogCLsCjY43JV7a8KQZX70dB1VP5w5dJ8%3D&reserved=0.

chase9 commented 3 years ago

Hi all,

The changes from @DMBlakeley have been merged into the latest beta (Beta 11). Could you please update to the latest beta and let me know if you run into any problems? I've been running it for a day now that have seen the plugin successfully recover from an ECONNECT error.

anthonyb82 commented 3 years ago

Hi, I’ve been running beta 11 since it was pushed last week. Have seen significant improvements- at best, I receive a refresh devices error a couple of times a day that quickly resolves itself.

Anthony

On Aug 18, 2021, at 8:37 AM, Chase Lau @.***> wrote:

 Hi all,

The changes from @DMBlakeley have been merged into the latest beta (Beta 11). Could you please update to the latest beta and let me know if you run into any problems? I've been running it for a day now that have seen the plugin successfully recover from an ECONNECT error.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Elder-HVAC-Man commented 3 years ago

How do I go about loading and installing the latest beta (Beta 11) into Homebridge?

DMBlakeley commented 3 years ago

If you are using Homebridge UI for plugin configuration, go to the Plugins page and select the "wrench" symbol and then "Install previous version". You will find beta.11 as an option.