krahabb / meross_lan

Home Assistant integration for Meross devices
MIT License
443 stars 47 forks source link

MSG100 Goes unavailable shortly after being rebooted. #132

Closed TheOneOgre closed 2 years ago

TheOneOgre commented 2 years ago

Recently setup the Meross_Lan integration and had everything working perfectly! After about 24 hours, the device will now not stay available even though it still works through the meross app. This is not fixed by a reboot of either the device, or Home Asisstant. A reboot will sometimes fix it temporarily, but it will be unavailable again in just a few minutes. Not sure what has happened as I have changed nothing since the inital setup.

The debug log attached below includes a manual reboot of the device by me unplugging it and plugging it back in.

2021/12/11 - 20:53:20 RX http GETACK Appliance.System.All {"all": {"system": {"hardware": {"type": "msg100", "subType": "us", "version": "3.5.0", "chipType": "mt7682", "uuid": "################################", "macAddress": "#################"}, "firmware": {"version": "3.2.5", "compileTime": "2021/09/16 13:47:36 GMT +08:00", "encrypt": 1, "wifiMac": "#################", "innerIp": "#########", "server": "##################", "port": "###", "userId": "#######"}, "time": {"timestamp": 1639274000, "timezone": "America/New_York", "timeRule": [[1636264800, -18000, 0], [1647154800, -14400, 1], [1667714400, -18000, 0], [1678604400, -14400, 1], [1699164000, -18000, 0], [1710054000, -14400, 1], [1730613600, -18000, 0], [1741503600, -14400, 1], [1762063200, -18000, 0], [1772953200, -14400, 1], [1793512800, -18000, 0], [1805007600, -14400, 1], [1825567200, -18000, 0], [1836457200, -14400, 1], [1857016800, -18000, 0], [1867906800, -14400, 1], [1888466400, -18000, 0], [1899356400, -14400, 1], [1919916000, -18000, 0], [1930806000, -14400, 1]]}, "online": {"status": 1}}, "digest": {"togglex": [{"channel": 0, "onoff": 1, "lmTime": 0}], "triggerx": [{"channel": 0, "id": "1111111111111111", "count": 2}], "timerx": [], "garageDoor": [{"channel": 0, "open": 0, "lmTime": 1639267849}]}}} 2021/12/11 - 20:53:20 auto LOG DEBUG MerossDevice(2104011676771136101348e1e9696f5c) back online! 2021/12/11 - 20:53:20 TX http GET Appliance.System.DNDMode {"DNDMode": {}} 2021/12/11 - 20:53:21 auto LOG INFO MerossDevice(2104011676771136101348e1e9696f5c) client connection attempt(2) error in async_http_request: Server disconnected 2021/12/11 - 20:53:21 auto LOG DEBUG MerossDevice(2104011676771136101348e1e9696f5c) going offline! 2021/12/11 - 20:53:21 auto LOG WARNING MerossDevice(2104011676771136101348e1e9696f5c) error in async_http_request: Server disconnected 2021/12/11 - 20:53:22 auto LOG WARNING MerossDevice(2104011676771136101348e1e9696f5c) error in async_http_request: Server disconnected 2021/12/11 - 20:53:42 TX http GET Appliance.System.All {"all": {}} 2021/12/11 - 20:53:42 RX http GETACK Appliance.System.All {"all": {"system": {"hardware": {"type": "msg100", "subType": "us", "version": "3.5.0", "chipType": "mt7682", "uuid": "################################", "macAddress": "#################"}, "firmware": {"version": "3.2.5", "compileTime": "2021/09/16 13:47:36 GMT +08:00", "encrypt": 1, "wifiMac": "#################", "innerIp": "#########", "server": "##################", "port": "###", "userId": "#######"}, "time": {"timestamp": 1639274022, "timezone": "America/New_York", "timeRule": [[1636264800, -18000, 0], [1647154800, -14400, 1], [1667714400, -18000, 0], [1678604400, -14400, 1], [1699164000, -18000, 0], [1710054000, -14400, 1], [1730613600, -18000, 0], [1741503600, -14400, 1], [1762063200, -18000, 0], [1772953200, -14400, 1], [1793512800, -18000, 0], [1805007600, -14400, 1], [1825567200, -18000, 0], [1836457200, -14400, 1], [1857016800, -18000, 0], [1867906800, -14400, 1], [1888466400, -18000, 0], [1899356400, -14400, 1], [1919916000, -18000, 0], [1930806000, -14400, 1]]}, "online": {"status": 1}}, "digest": {"togglex": [{"channel": 0, "onoff": 1, "lmTime": 0}], "triggerx": [{"channel": 0, "id": "1111111111111111", "count": 2}], "timerx": [], "garageDoor": [{"channel": 0, "open": 0, "lmTime": 1639267849}]}}} 2021/12/11 - 20:53:42 auto LOG DEBUG MerossDevice(2104011676771136101348e1e9696f5c) back online! 2021/12/11 - 20:53:42 TX http GET Appliance.System.DNDMode {"DNDMode": {}} 2021/12/11 - 20:53:42 RX http GETACK Appliance.System.DNDMode {"DNDMode": {"mode": 0}} 2021/12/11 - 20:54:12 TX http GET Appliance.System.All {"all": {}} 2021/12/11 - 20:54:14 auto LOG INFO MerossDevice(2104011676771136101348e1e9696f5c) client connection attempt(0) error in async_http_request: Server disconnected 2021/12/11 - 20:54:14 TX http GET Appliance.System.All {"all": {}} 2021/12/11 - 20:54:17 auto LOG INFO MerossDevice(2104011676771136101348e1e9696f5c) client connection attempt(1) error in async_http_request: Server disconnected 2021/12/11 - 20:54:17 TX http GET Appliance.System.All {"all": {}} 2021/12/11 - 20:54:20 auto LOG INFO MerossDevice(2104011676771136101348e1e9696f5c) client connection attempt(2) error in async_http_request: Server disconnected 2021/12/11 - 20:54:20 auto LOG DEBUG MerossDevice(2104011676771136101348e1e9696f5c) going offline! 2021/12/11 - 20:54:42 TX http GET Appliance.System.All {"all": {}} 2021/12/11 - 20:54:48 RX http GETACK Appliance.System.All {"all": {"system": {"hardware": {"type": "msg100", "subType": "us", "version": "3.5.0", "chipType": "mt7682", "uuid": "################################", "macAddress": "#################"}, "firmware": {"version": "3.2.5", "compileTime": "2021/09/16 13:47:36 GMT +08:00", "encrypt": 1, "wifiMac": "#################", "innerIp": "#########", "server": "##################", "port": "###", "userId": "#######"}, "time": {"timestamp": 1639274088, "timezone": "America/New_York", "timeRule": [[1636264800, -18000, 0], [1647154800, -14400, 1], [1667714400, -18000, 0], [1678604400, -14400, 1], [1699164000, -18000, 0], [1710054000, -14400, 1], [1730613600, -18000, 0], [1741503600, -14400, 1], [1762063200, -18000, 0], [1772953200, -14400, 1], [1793512800, -18000, 0], [1805007600, -14400, 1], [1825567200, -18000, 0], [1836457200, -14400, 1], [1857016800, -18000, 0], [1867906800, -14400, 1], [1888466400, -18000, 0], [1899356400, -14400, 1], [1919916000, -18000, 0], [1930806000, -14400, 1]]}, "online": {"status": 1}}, "digest": {"togglex": [{"channel": 0, "onoff": 1, "lmTime": 0}], "triggerx": [{"channel": 0, "id": "1111111111111111", "count": 2}], "timerx": [], "garageDoor": [{"channel": 0, "open": 0, "lmTime": 1639267849}]}}} 2021/12/11 - 20:54:48 auto LOG DEBUG MerossDevice(2104011676771136101348e1e9696f5c) back online! 2021/12/11 - 20:54:48 TX http GET Appliance.System.DNDMode {"DNDMode": {}} 2021/12/11 - 20:54:51 auto LOG INFO MerossDevice(2104011676771136101348e1e9696f5c) client connection attempt(0) error in async_http_request: Server disconnected 2021/12/11 - 20:54:51 TX http GET Appliance.System.DNDMode {"DNDMode": {}} 2021/12/11 - 20:54:56 auto LOG INFO MerossDevice(2104011676771136101348e1e9696f5c) client connection attempt(1) error in async_http_request: Server disconnected 2021/12/11 - 20:54:56 TX http GET Appliance.System.DNDMode {"DNDMode": {}} 2021/12/11 - 20:54:59 auto LOG INFO MerossDevice(2104011676771136101348e1e9696f5c) client connection attempt(2) error in async_http_request: Server disconnected 2021/12/11 - 20:54:59 auto LOG DEBUG MerossDevice(2104011676771136101348e1e9696f5c) going offline! 2021/12/11 - 20:55:12 TX http GET Appliance.System.All {"all": {}} 2021/12/11 - 20:55:14 auto LOG WARNING MerossDevice(2104011676771136101348e1e9696f5c) error in async_http_request: Server disconnected

Firmware is on latest 3.2.5

krahabb commented 2 years ago

Hello @TheOneOgre , I guess your device is still paired to Meross app. Were you able to set the proper device key in configuration or are you using an empty key ? If the latter, that could explain the issue. You can check this for help on how to setup the correct device key if you didn't before

TheOneOgre commented 2 years ago

Hello @TheOneOgre , I guess your device is still paired to Meross app. Were you able to set the proper device key in configuration or are you using an empty key ? If the latter, that could explain the issue. You can check this for help on how to setup the correct device key if you didn't before

It is indeed still paired to the Meross app as I'm unsure how to unlink it from Meross without deleting all the configuration settings for the device. I initial used the blank device key setup process which then asked for my Meross credentials to grab the key, which I had issues with, then I setup the device by pulling the device key from my Meross account (was the same key) and entering that into the device key field when I tried reinstalling the integration, however the result is the same, it's also using a static IP.

I was going to try and setup the MQTT way using the meross_pair app but, upon seeing how much extra effort is required to connect it to my existing MQTT server, I figured I'd just go through diagnostics here. Plus I saw that another LAN only integration with a built in MQTT server was in the works.

krahabb commented 2 years ago

At the moment I can't really figure out what's wrong. It looks by the logs the same request gets sometimes replied correctly while others not (Server Disconnected) The Server Disconnected error usually happens when the device intentionally closes the connection and I think (real guessing here) of it like a kind of security feature the device firmware puts in place when it receives a malformed message or so.

TheOneOgre commented 2 years ago

So is there anything I can do about it? Even as a workaround? Or is this something that is going to take time? If it helps, using Meross IoT from HACS allows it to stay available for 12-24 hours before having to reload the integration, whereas this integration won't work for more than a few minutes. Maybe differences in API calls could help show light on the problem? 🤷

krahabb commented 2 years ago

@TheOneOgre , I really can't figure out for sure what's going on. These are some 'trasversal' thoughts:

This issue itself should not be related to a local connectivity issue since the error is a 'Server disconnected' which means the TCP/HTTP stack was able to reach the device but then this closed the connection in a somewhat clean way. Also, I don't think the meross mqtt servers for your zone are so 'faulty' to not being reachable so often.

If you have other meross devices you could compare their behaviour in meross_lan. I've experienced (and other users too) many 'glitches' in devices which were apparently working but, in the end, were bugged and needed to be returned/substitued because of unreliable behaviours.

TheOneOgre commented 2 years ago

@krahabb Thanks for the updated thoughts.

To address a few things,

  1. I have only tried using 1 stack at any given time. If/when trying Meross_Lan, I was exclusively using that or meross_iot, only together one time to see if that made a difference (it did not)
  2. If it's losing connection, is this something I could diagnose via watching it's dns requests or connection through Wireshark and seeing the connection drop?
  3. This is my first and only Meross device, so I'm unable to compare to other ones
krahabb commented 2 years ago

@TheOneOgre , I think monitoring the whole traffic of the device through wireshark would be a nice starting point but I'm scared the effort could soon 'explode'. Anyway, If that's the case I would start monitoring trying to identify the different traffic flows from the device. To my knowledge the device should at least:

TheOneOgre commented 2 years ago

After monitoring the device with WireGaurd, nothing out of the ordinary is happening, just regular requests to NTP and Meross servers, plus extra when activating the device via the Meross app.

I do however have the issue of not being able to device log via the Meross_Lan integration as the "debug tracing" is only set for 10 minutes, and this issue can take between minutes to MANY hours before running into an issue sometimes. Is there a way to permanently enable debug tracing?

Secondly. I found that decreasing the device polling interval to 60 seconds, has kept the device available for longer, and once dropped offline, seems to come back by itself sometimes in minutes, sometimes hours. Again, very difficult issue to trace down.

krahabb commented 2 years ago

You can 'enlarge' traces by setting these 2 parameters in 'meross_lan/const.py' around row nr 35. You'll have to increase both, maybe set CONF_TRACE_MAXSIZE = really big big integer so you're not cut out of traces because of the size limit and then set the TIMEOUT to suit your needs.

CONF_TRACE_TIMEOUT = 600 # when starting a trace stop it and close the file after .. secs
CONF_TRACE_MAXSIZE = 65536 # or when MAXSIZE exceeded
TheOneOgre commented 2 years ago

So the log shows the http disconnects as it was before, then the device comes back online, and sometime later the same async_http_disconnect occurs. The timing is completely random and nothing lines up with the WireGuard log as the device seems to be doing it's regular routine as far as it's network IO is concerned. The requests go to the device, but sometime's it just isn't returned.

I think at this point I'm pretty much done trying to diagnose this, I bought this to prevent having to do a bunch of working creating an ESP1866 with reed switches and configuring MQTT myself, but if I basically have to do all of that for this device which may even still not work then, I might as well go with the ESP solution.

Great work on the integration anyways! Everything seems to be full featured and works well when it does, this seems to be a "just me" issue, which I am VERY familiar with, always seems when I try something, it doesn't work for a different reason than anyone else, and it's usually unsolvable.

Thanks for your help diagnosing with me, but like you said earlier, the effort has "exploded" on trying to get this to work. If you have any ideas at all, please let me know and I will try them as long as I have the device installed.

TheOneOgre commented 2 years ago

Well... I seemed to have fixed it lol

As I went to undo it for replacement, I realized, I may have connected to it to my wireless extender for my IoT devices, however that extender is pretty old and I've had issues with connectivity through it before, so I checked, and sure enough it was connected to it.

As soon as I switched it to my main ap for Iot, it was a lot more responsive and it has now been over 48 hours without it going down at all. Seems the wireless extender was sending/receiving malformed packages which was causing the device to block the integration.

So basically, was a simple issue as usual. Thanks again for the help!