krahabb / meross_lan

Home Assistant integration for Meross devices
MIT License
415 stars 45 forks source link

Meross Devices Constantly Becoming Unavailable, Setup With MQTT via Custom Pairer App, Hacks Device Key #206

Open timnolte opened 2 years ago

timnolte commented 2 years ago

Version of the custom_component

v2.6.1

Configuration

I am using the Mosquitto MQTT Broker, which I use along with Zigbee2MQTT as well. I have a unique HA user setup based on the MAC address of each device and I've configured this with each device when using the Android Custom Pairer app.

Describe the bug

As visible in the History/Logbook I see can the devices constantly going unavailable, then back online. This seems to have gotten worse with each new release of HA since the 2022.8 releases. While trying to debug this I can't even get/download the diagnostic on any of the Meross devices. My devices that were successfully setup via MQTT seem not to be also connected via HTTP. I am unable to change the Connection Protocol to anything other than MQTT for some of the devices as there is also no Host Address and also no Device Key(since it was configured with the Hacks Mode). And there seems to be no way to correct this either. I have a few devices that are connected only via HTTP, which are HomeKit Compatible devices, and those ones can't communicate via MQTT it seems, and those devices also become unavailable. I didn't want to have to have any of my devices registered through the Meross Cloud which is why I went the route of configuring them through the Custom Pairer App but I still can't seem to find the right setup that gets my devices connected via both HTTP & MQTT.

I'm really at a loss as to what to do other than remove all of my Meross devices, and the integration, and start all over again with no choice but to have my devices all connected to the Meross Cloud.

Debug log

Trying to provide relevant log entries with some of the data masked. I have the full raw logs that I'd be happy to provide in a more secure way.

2022-08-17 20:24:09.830 WARNING (MainThread) [homeassistant.config_entries] Config entry 'Living Room Lights (mss510x)' for meross_lan integration not ready yet: MQTT unavailable; Retrying in background
2022-08-17 20:24:09.835 WARNING (MainThread) [homeassistant.config_entries] Config entry 'MQTT Hub' for meross_lan integration not ready yet: MQTT unavailable; Retrying in background
2022-08-17 20:24:09.839 WARNING (MainThread) [homeassistant.config_entries] Config entry 'Bathroom Fan (mss510x)' for meross_lan integration not ready yet: MQTT unavailable; Retrying in background
2022-08-17 20:24:11.696 DEBUG (MainThread) [custom_components.meross_lan] MerossHttpClient(192.168.12.71): HTTP Response ({"header":{"messageId":"ab0247af137e4fef8b9c667e68c14b6a","namespace":"Appliance.System.All","method":"GETACK","payloadVersion":1,"from":"/appliance/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/publish","timestamp":1660782251,"timestampMs":989,"sign":"********************************"},"payload":{"all":{"system":{"hardware":{"type":"mss550x","subType":"us","version":"4.0.0","chipType":"MT7686","uuid":"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX","macAddress":"xx:xx:xx:xx:xx:xx"},"firmware":{"version":"4.2.2","homekitVersion":"2.0.1","compileTime":"Sep 23 2021 17:21:34","encrypt":1,"wifiMac":"xx:xx:xx:xx:xx:xx","innerIp":"192.168.12.71","server":"192.168.12.6","port":8883,"userId":0},"time":{"timestamp":1660782251,"timezone":"","timeRule":[]},"online":{"status":0,"bindId":"","who":0}},"digest":{"togglex":[{"channel":0,"onoff":0,"lmTime":1660708156}],"triggerx":[],"timerx":[]}}}}
)
2022-08-17 20:24:11.697 DEBUG (MainThread) [custom_components.meross_lan] MerossDevice(XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX) back online!
2022-08-17 20:24:16.303 DEBUG (MainThread) [custom_components.meross_lan] MerossApi: MQTT RECV device_id:(XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX) method:(GETACK) namespace:(Appliance.System.All)
2022-08-17 20:24:16.304 DEBUG (MainThread) [custom_components.meross_lan] MerossDevice(XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX) back online!
2022-08-17 20:24:16.305 DEBUG (MainThread) [custom_components.meross_lan] MerossApi: MQTT SEND device_id:(XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX) method:(GET) namespace:(Appliance.System.Runtime)
gary-sargent commented 5 months ago

Captured another disconnect and got the same trace - two client connections logged, but only one debug message in HA:

2024-03-18 19:01:44.167 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################18] Polling begin
2024-03-18 19:01:44.168 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################18] TX(http) GET Appliance.System.All (messageId:061dba7cdebe44c0aa5ba2422ef4706f)
2024-03-18 19:01:46.366 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################18] HTTP ERROR GET Appliance.System.All (messageId:061dba7cdebe44c0aa5ba2422ef4706f ClientResponseError:502, message='Bad Gateway', url=URL('http://192.168.123.100/config') attempt:0)
2024-03-18 19:01:46.366 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################18] TX(http) GET Appliance.System.All (messageId:061dba7cdebe44c0aa5ba2422ef4706f)
2024-03-18 19:01:46.560 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################18] HTTP ERROR GET Appliance.System.All (messageId:061dba7cdebe44c0aa5ba2422ef4706f ClientResponseError:502, message='Bad Gateway', url=URL('http://192.168.123.100/config') attempt:1)
2024-03-18 19:01:46.560 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################18] TX(http) GET Appliance.System.All (messageId:061dba7cdebe44c0aa5ba2422ef4706f)
2024-03-18 19:01:46.830 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################18] HTTP ERROR GET Appliance.System.All (messageId:061dba7cdebe44c0aa5ba2422ef4706f ClientResponseError:502, message='Bad Gateway', url=URL('http://192.168.123.100/config') attempt:2)
2024-03-18 19:01:46.831 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################18] Polling end
[19:01:44.124][192.168.123.200:56988] client connect
[19:01:45.124][192.168.123.200:56620] client connect
[19:01:45.349][192.168.123.200:56988] server connect 192.168.123.149:80
192.168.123.200:56988: POST http://192.168.123.149/config
    Host: 192.168.123.149
    User-Agent: HomeAssistant/2024.3.1 aiohttp/3.9.3 Python/3.12
    Accept: */*
    Accept-Encoding: gzip, deflate, br
    Content-Length: 246
    Content-Type: text/plain; charset=utf-8

    {"header":{"messageId":"061dba7cdebe44c0aa5ba2422ef4706f","namespace":"Appliance.System.All","method":"GET","payloadVersion":1,"from":"Meross","timestamp":1710788504,"timestampMs":0,"sign":"f3cfd62ea10d7ad7ce2cf555b031a833"},"payload":{"all":{}}}

 << Client disconnected.
[19:01:45.362][192.168.123.200:56988] client disconnect
[19:01:45.446][192.168.123.200:56620] server connect 192.168.123.149:80
192.168.123.200:56620: POST http://192.168.123.149/config
    Host: 192.168.123.149
    User-Agent: HomeAssistant/2024.3.1 aiohttp/3.9.3 Python/3.12
    Accept: */*
    Accept-Encoding: gzip, deflate, br
    Content-Length: 246
    Content-Type: text/plain; charset=utf-8

    {"header":{"messageId":"061dba7cdebe44c0aa5ba2422ef4706f","namespace":"Appliance.System.All","method":"GET","payloadVersion":1,"from":"Meross","timestamp":1710788504,"timestampMs":0,"sign":"f3cfd62ea10d7ad7ce2cf555b031a833"},"payload":{"all":{}}}

 << server closed connection
[19:01:46.316][192.168.123.200:56620] server disconnect 192.168.123.149:80
[19:01:46.316][192.168.123.200:56620] client disconnect
[19:01:46.325][192.168.123.200:56636] client connect
[19:01:46.466][192.168.123.200:56636] server connect 192.168.123.149:80
192.168.123.200:56636: POST http://192.168.123.149/config
    Host: 192.168.123.149
    User-Agent: HomeAssistant/2024.3.1 aiohttp/3.9.3 Python/3.12
    Accept: */*
    Accept-Encoding: gzip, deflate, br
    Content-Length: 246
    Content-Type: text/plain; charset=utf-8

    {"header":{"messageId":"061dba7cdebe44c0aa5ba2422ef4706f","namespace":"Appliance.System.All","method":"GET","payloadVersion":1,"from":"Meross","timestamp":1710788504,"timestampMs":0,"sign":"f3cfd62ea10d7ad7ce2cf555b031a833"},"payload":{"all":{}}}

 << server closed connection
[19:01:46.509][192.168.123.200:56636] server disconnect 192.168.123.149:80
[19:01:46.509][192.168.123.200:56636] client disconnect
[19:01:46.517][192.168.123.200:56640] client connect
[19:01:46.544][192.168.123.200:56640] server connect 192.168.123.149:80
192.168.123.200:56640: POST http://192.168.123.149/config
    Host: 192.168.123.149
    User-Agent: HomeAssistant/2024.3.1 aiohttp/3.9.3 Python/3.12
    Accept: */*
    Accept-Encoding: gzip, deflate, br
    Content-Length: 246
    Content-Type: text/plain; charset=utf-8

    {"header":{"messageId":"061dba7cdebe44c0aa5ba2422ef4706f","namespace":"Appliance.System.All","method":"GET","payloadVersion":1,"from":"Meross","timestamp":1710788504,"timestampMs":0,"sign":"f3cfd62ea10d7ad7ce2cf555b031a833"},"payload":{"all":{}}}

 << server closed connection
[19:01:46.779][192.168.123.200:56640] server disconnect 192.168.123.149:80
[19:01:46.780][192.168.123.200:56640] client disconnect
gary-sargent commented 5 months ago

So in trace above - there were three TX(http) logged, but MITM trace shows four.

gary-sargent commented 5 months ago

@krahabb I just installed the changes here https://github.com/krahabb/meross_lan/commit/ee3035d01ec050edbe3c31dae0a2eb8f087fab6d

Looking positive so far! Only been running for 15 minutes so need to let it run longer, but normally I'd have had a load of disconnects by now.

gary-sargent commented 5 months ago

@krahabb not a single disconnect in an hour - I'd definitely get a beta out with this change. Looks to have fixed the issue.

krahabb commented 5 months ago

Yeah, I'm refining some more fixes and going to release it asap

gary-sargent commented 5 months ago

@krahabb I have good news and bad news. The good news is my devices are MUCH more stable now - so the fix is definitely a positive one.

I am however seeing a few failures - though a lot less than normal.

What is interesting however is I'm not seeing any second and third attempts like I was before. Is it possible the retry functionality has been broken with the change? Here are all my logs with "HTTP ERROR" in them, and you'll see they are all "attempt:0" and then the device goes offline. There are no "attempt:1" or "attempt:2" entries like I had previously.

image

wsw70 commented 5 months ago

I am however seeing a few failures - though a lot less than normal.

As a side note - is there a pattern when these errors appear? (number of devices maybe?)

I have a dozen devices and never saw the error in my logs.

gary-sargent commented 5 months ago

@krahabb I've just been looking again at my recent errors and compared them to my previous ones. The new ones are all timeout errors, whereas previously it was bad gateway. There doesn't look to be any retries when the timeout occurs.

If you look at this snippet, both device #9 and #11 have a timeout, then immediately say "Polling end" instead of retrying.

2024-03-19 05:10:11.664 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################11] Polling begin
2024-03-19 05:10:11.666 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################11] TX(http) GET Appliance.System.All (messageId:9970fd927c5b447cb0264036c1093952)
2024-03-19 05:10:12.074 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] Polling begin
2024-03-19 05:10:12.074 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] TX(http) GET Appliance.System.All (messageId:a7a86d813c21430aaf388c9583334e24)
2024-03-19 05:10:13.605 INFO (MainThread) [custom_components.blueiris.api.blue_iris_api] Updating data from BI Server (BlueIris)
2024-03-19 05:10:14.510 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling begin
2024-03-19 05:10:14.511 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] TX(http) SET Appliance.Control.Multiple (messageId:a310b4544ca642c1ad4c2afa142de531)
2024-03-19 05:10:16.691 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################11] HTTP ERROR GET Appliance.System.All (messageId:9970fd927c5b447cb0264036c1093952 TimeoutError: attempt:0)
2024-03-19 05:10:16.692 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################11] Polling end
2024-03-19 05:10:16.967 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################17] Polling begin
2024-03-19 05:10:16.968 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################17] TX(http) GET Appliance.System.All (messageId:a4199c0f36144d49a58642ac2f1bdecf)
2024-03-19 05:10:17.218 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################17] RX(http) GETACK Appliance.System.All (messageId:a4199c0f36144d49a58642ac2f1bdecf)
2024-03-19 05:10:17.218 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################17] Polling end
2024-03-19 05:10:17.267 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] Polling begin
2024-03-19 05:10:17.268 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] TX(http) SET Appliance.Control.Multiple (messageId:450d4c389b7f4ace94d049239c92b3e8)
2024-03-19 05:10:17.303 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] RX(http) SETACK Appliance.Control.Multiple (messageId:450d4c389b7f4ace94d049239c92b3e8)
2024-03-19 05:10:17.304 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] Appliance.Control.Multiple requests=2 (responses=2) expected size=2119 (actual=2119)
2024-03-19 05:10:17.305 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] Polling end
2024-03-19 05:10:17.692 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] HTTP ERROR GET Appliance.System.All (messageId:a7a86d813c21430aaf388c9583334e24 TimeoutError: attempt:0)
2024-03-19 05:10:17.693 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] Polling end
2024-03-19 05:10:18.529 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] RX(http) SETACK Appliance.Control.Multiple (messageId:a310b4544ca642c1ad4c2afa142de531)
2024-03-19 05:10:18.530 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Appliance.Control.Multiple requests=3 (responses=3) expected size=2341 (actual=2490)
2024-03-19 05:10:18.534 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] TX(http) GET Appliance.System.DNDMode (messageId:d86cf4770ac24f938e8671e1a33fee09)
2024-03-19 05:10:18.620 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] RX(http) GETACK Appliance.System.DNDMode (messageId:d86cf4770ac24f938e8671e1a33fee09)
2024-03-19 05:10:18.621 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling end
krahabb commented 5 months ago

yes , there was an 'inner' timeout retry loop which I've removed to try simplfy the code since I thought the new serialized connection model was enough to make it reliable

krahabb commented 5 months ago

I'm pushing a commit restoring the former trick.

This anyway lead me to thinking that you might be experiencing (too much) packet losses on your LAN which are not recovered during the default 5 second timeout coded in meross_lan. This is likely also explaining why you where 'hitted' by the server disconnections and had multiple connections attempts (documented in your sniffing testing) in the same time frame with previous version of the component:

If I'm not wrong what was happening before was something like this:

Now, removing the timeout trick (together with ensuring aiohttp uses only 1 tcp connection at a time for the device) is surely preventing multiple concurrent connections but this leads to the 'original sin': the device doesn't reply in a timely manner to the incoming request. If the tcp connection start packet is lost (which could happen of course in any network) the tcp stack should retry a couple of attempts before reporting the error to the upper layers, but I don't remember the standard procedures nor timers for this. I guess the default 'standard' timing for the connection establishment should be more than the 5 seconds that meross_lan expects as a timeout for the connection.

The trick of course was a trick (in the end, the 'ServerDisconnectedError' was caused by those 'hidden' multiple connections started by the algorithm), that's why I've removed that, expecting your observations and the consequent fix were enough to ensure reliable connections.

Having more than (the hardcoded) 5 seconds timeout in meross_lan could could be a solution (but I'm wondering which value could be 'fine'). The drawback is that the requests might 'stall' for longer than needed since if the device is really unreachable, why would we wait longer? Also, having long living tasks active in the code is always scary expecially if the async behavior of the (meross_lan) component is not well designed. In the beginning this 'not safe design' was true for sure. Over time I've managed to better understand the mechanics of the python async behavior but even if today my design is better than it used to be, I'm still (a bit) scared of these requests tasks pending to be completed for longer times.

As a final (maybe) consideration, the trick proved to be effective in avoiding the 'stall' since that first connection timeout could have resolved in a real timeout (maybe after 1 minute or whatever the tcp default behavior states). When this happened, retrying a new connection in a short time frame (the 1 sec stated above) proved to be successful without waiting for the full network stack mechanics to deploy themselves.

EDIT: I'm really not sure (waiting your testing though) what to expect with the trick restoration. Moreover, I don't have a clear idea on how to proceed (that's why I hope the issue could be better fixed in your LAN environment if that's the reason of these underlying timeouts)

gary-sargent commented 5 months ago

@krahabb the behaviour I see is:

Poll cycle 1 - Poll device. A HTTP error occurs. Poll cycle 2 (30 seconds later or whatever period you have configured) - device marked as offline.

Why is the device marked as offline in the subsequent poll cycle - why wait to do this?

Which also begs the question - why not mark the device as offline after two unsuccessful polls, instead of trying to retry during one poll? Could we have...

Poll cycle 1 - Poll device. A HTTP error occurs. Poll cycle 2 - Poll device. A HTTP error occurs. Device marked as offline.

Why not do another poll in the subsequent poll cycle, instead of just marking it offline as a result of the previous poll 30 seconds ago?

gary-sargent commented 5 months ago

@krahabb see my comment here https://github.com/krahabb/meross_lan/issues/414#issuecomment-2011669620 which can cause the plugs to sometimes be on a less favourable network connection. However I have observed issues with ones which I know are close to the nearest access point, and should have a good signal. Also I'm not observing any ping losses (I'm pinging all devices every 20 seconds).

andker87 commented 5 months ago

This problem also plagued me for several days, until yesterday I took courage, removed everything and started from scratch. Everything seemed okay, the devices were back working, until a few minutes ago, when one of the three Meross 310 smart plugs, the one closest to the router (about 1mt, 100% signal strength), went back to Unavailable status, even though from the Meross app it works perfectly and of course on the router side there is no problem or blockage

gary-sargent commented 5 months ago

@krahabb I tried https://github.com/krahabb/meross_lan/commit/333e816f8b6aa88465297a5ef2f8833fbd85fc6b but didn't seem to get any retries. Devices #4 and #11 both had a failure, though only 11 went offline. See below:

Device 4 logs:

2024-03-21 10:58:12.813 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] Polling begin
2024-03-21 10:58:12.814 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] TX(http) SET Appliance.Control.Multiple (messageId:170f5a4929ff47e2a96d48ed43c5d258)
2024-03-21 10:58:18.693 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] HTTP ERROR SET Appliance.Control.Multiple (messageId:170f5a4929ff47e2a96d48ed43c5d258 TimeoutError: attempt:0)
2024-03-21 10:58:18.693 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] Appliance.Control.Multiple failed with no response: requests=2 expected size=1880
2024-03-21 10:58:18.694 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] TX(http) GET Appliance.System.All (messageId:18827e540fc640858741a8052d5d4ae3)
2024-03-21 10:58:24.381 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] RX(http) GETACK Appliance.System.All (messageId:18827e540fc640858741a8052d5d4ae3)
2024-03-21 10:58:24.382 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] TX(http) GET Appliance.System.Runtime (messageId:2220e87c4690412fb93756e6b11a5686)
2024-03-21 10:58:24.433 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] RX(http) GETACK Appliance.System.Runtime (messageId:2220e87c4690412fb93756e6b11a5686)
2024-03-21 10:58:24.434 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] Polling end
2024-03-21 10:59:24.702 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] Polling begin
2024-03-21 10:59:24.702 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] TX(http) GET Appliance.System.All (messageId:be6c22b9ce69489dbfb3819a44adb574)
2024-03-21 10:59:24.872 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] RX(http) GETACK Appliance.System.All (messageId:be6c22b9ce69489dbfb3819a44adb574)
2024-03-21 10:59:24.873 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] Polling end

Device 11 logs:

2024-03-21 10:58:13.270 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling begin
2024-03-21 10:58:13.271 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] TX(http) SET Appliance.Control.Multiple (messageId:01affc96ae984affa04fc35c16081876)
2024-03-21 10:58:14.287 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] RX(http) SETACK Appliance.Control.Multiple (messageId:01affc96ae984affa04fc35c16081876)
2024-03-21 10:58:14.291 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Appliance.Control.Multiple requests=3 (responses=3) expected size=2340 (actual=2495)
2024-03-21 10:58:14.298 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] TX(http) GET Appliance.System.DNDMode (messageId:a8b2eda8b49d4edd86f38489f788809c)
2024-03-21 10:58:22.691 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] HTTP ERROR GET Appliance.System.DNDMode (messageId:a8b2eda8b49d4edd86f38489f788809c TimeoutError: attempt:0)
2024-03-21 10:58:22.691 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling end
2024-03-21 10:58:52.692 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling begin
2024-03-21 10:58:52.692 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Going offline!
2024-03-21 10:58:52.694 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling end

All logs during the period:

2024-03-21 10:58:12.813 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] Polling begin
2024-03-21 10:58:12.814 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] TX(http) SET Appliance.Control.Multiple (messageId:170f5a4929ff47e2a96d48ed43c5d258)
2024-03-21 10:58:12.834 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################8] Polling begin
2024-03-21 10:58:12.835 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################8] TX(http) SET Appliance.Control.Multiple (messageId:cd34f7ad9a48450383552592942e8e1a)
2024-03-21 10:58:12.896 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################5] Polling begin
2024-03-21 10:58:12.897 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################5] TX(http) SET Appliance.Control.Multiple (messageId:62453ba3cd6f424390c9096a670035bd)
2024-03-21 10:58:12.903 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################13] Polling begin
2024-03-21 10:58:12.903 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################13] TX(http) SET Appliance.Control.Multiple (messageId:aaad7257be5544d8b0eb647e4b5672be)
2024-03-21 10:58:13.009 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] Polling begin
2024-03-21 10:58:13.010 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] TX(http) SET Appliance.Control.Multiple (messageId:e02562f79dd1485ebfc71c0c33c670d8)
2024-03-21 10:58:13.270 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling begin
2024-03-21 10:58:13.271 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] TX(http) SET Appliance.Control.Multiple (messageId:01affc96ae984affa04fc35c16081876)
2024-03-21 10:58:14.073 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] RX(http) SETACK Appliance.Control.Multiple (messageId:e02562f79dd1485ebfc71c0c33c670d8)
2024-03-21 10:58:14.074 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] Appliance.Control.Multiple requests=2 (responses=2) expected size=1889 (actual=1966)
2024-03-21 10:58:14.075 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] Polling end
2024-03-21 10:58:14.287 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] RX(http) SETACK Appliance.Control.Multiple (messageId:01affc96ae984affa04fc35c16081876)
2024-03-21 10:58:14.291 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Appliance.Control.Multiple requests=3 (responses=3) expected size=2340 (actual=2495)
2024-03-21 10:58:14.298 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] TX(http) GET Appliance.System.DNDMode (messageId:a8b2eda8b49d4edd86f38489f788809c)
2024-03-21 10:58:14.386 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################7] Polling begin
2024-03-21 10:58:14.388 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################7] TX(http) SET Appliance.Control.Multiple (messageId:c0645f9003ed4f9ba4d42615ff0f5b66)
2024-03-21 10:58:14.394 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################11] Polling begin
2024-03-21 10:58:14.395 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################11] TX(http) SET Appliance.Control.Multiple (messageId:fb4f83f8825f4e6cbd7e2294a66df6cf)
2024-03-21 10:58:14.439 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################10] Polling begin
2024-03-21 10:58:14.440 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################10] TX(http) SET Appliance.Control.Multiple (messageId:91ba726be3dd4f0d8a7b03722279383b)
2024-03-21 10:58:14.468 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################2] Polling begin
2024-03-21 10:58:14.468 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################2] TX(http) SET Appliance.Control.Multiple (messageId:92ef1c72e4b746b3b84368e12d7a5a4e)
2024-03-21 10:58:18.693 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] HTTP ERROR SET Appliance.Control.Multiple (messageId:170f5a4929ff47e2a96d48ed43c5d258 TimeoutError: attempt:0)
2024-03-21 10:58:18.693 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] Appliance.Control.Multiple failed with no response: requests=2 expected size=1880
2024-03-21 10:58:18.694 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] TX(http) GET Appliance.System.All (messageId:18827e540fc640858741a8052d5d4ae3)
2024-03-21 10:58:22.203 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################10] RX(http) SETACK Appliance.Control.Multiple (messageId:91ba726be3dd4f0d8a7b03722279383b)
2024-03-21 10:58:22.203 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################10] Appliance.Control.Multiple requests=2 (responses=2) expected size=1880 (actual=1957)
2024-03-21 10:58:22.204 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################10] Polling end
2024-03-21 10:58:22.691 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] HTTP ERROR GET Appliance.System.DNDMode (messageId:a8b2eda8b49d4edd86f38489f788809c TimeoutError: attempt:0)
2024-03-21 10:58:22.691 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling end
2024-03-21 10:58:23.788 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################11] RX(http) SETACK Appliance.Control.Multiple (messageId:fb4f83f8825f4e6cbd7e2294a66df6cf)
2024-03-21 10:58:23.788 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################11] Appliance.Control.Multiple requests=2 (responses=2) expected size=1880 (actual=1957)
2024-03-21 10:58:23.789 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################11] Polling end
2024-03-21 10:58:24.039 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################7] RX(http) SETACK Appliance.Control.Multiple (messageId:c0645f9003ed4f9ba4d42615ff0f5b66)
2024-03-21 10:58:24.039 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################7] Appliance.Control.Multiple requests=2 (responses=2) expected size=1879 (actual=1957)
2024-03-21 10:58:24.040 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################7] Polling end
2024-03-21 10:58:24.051 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################8] RX(http) SETACK Appliance.Control.Multiple (messageId:cd34f7ad9a48450383552592942e8e1a)
2024-03-21 10:58:24.052 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################8] Appliance.Control.Multiple requests=2 (responses=2) expected size=1880 (actual=1957)
2024-03-21 10:58:24.052 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################8] Polling end
2024-03-21 10:58:24.331 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################2] RX(http) SETACK Appliance.Control.Multiple (messageId:92ef1c72e4b746b3b84368e12d7a5a4e)
2024-03-21 10:58:24.332 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################2] Appliance.Control.Multiple requests=3 (responses=3) expected size=2209 (actual=2321)
2024-03-21 10:58:24.333 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################2] Polling end
2024-03-21 10:58:24.359 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################5] RX(http) SETACK Appliance.Control.Multiple (messageId:62453ba3cd6f424390c9096a670035bd)
2024-03-21 10:58:24.359 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################5] Appliance.Control.Multiple requests=2 (responses=2) expected size=1889 (actual=1966)
2024-03-21 10:58:24.360 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################5] Polling end
2024-03-21 10:58:24.381 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] RX(http) GETACK Appliance.System.All (messageId:18827e540fc640858741a8052d5d4ae3)
2024-03-21 10:58:24.382 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] TX(http) GET Appliance.System.Runtime (messageId:2220e87c4690412fb93756e6b11a5686)
2024-03-21 10:58:24.392 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################13] RX(http) SETACK Appliance.Control.Multiple (messageId:aaad7257be5544d8b0eb647e4b5672be)
2024-03-21 10:58:24.392 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################13] Appliance.Control.Multiple requests=2 (responses=2) expected size=1890 (actual=1968)
2024-03-21 10:58:24.392 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################13] Polling end
2024-03-21 10:58:24.433 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] RX(http) GETACK Appliance.System.Runtime (messageId:2220e87c4690412fb93756e6b11a5686)
2024-03-21 10:58:24.434 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################4] Polling end
2024-03-21 10:58:24.743 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################16] Polling begin
2024-03-21 10:58:24.744 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################16] TX(http) SET Appliance.Control.Multiple (messageId:f872139cdbdc4227be2cbe9621704824)
2024-03-21 10:58:24.989 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################16] RX(http) SETACK Appliance.Control.Multiple (messageId:f872139cdbdc4227be2cbe9621704824)
2024-03-21 10:58:24.990 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################16] Appliance.Control.Multiple requests=3 (responses=3) expected size=2341 (actual=2495)
2024-03-21 10:58:24.991 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################16] TX(http) GET Appliance.System.DNDMode (messageId:f90d5b4ab2ad4ecd925591c1ee07ccdf)
2024-03-21 10:58:25.075 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################16] RX(http) GETACK Appliance.System.DNDMode (messageId:f90d5b4ab2ad4ecd925591c1ee07ccdf)
2024-03-21 10:58:25.076 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################16] Polling end
2024-03-21 10:58:25.713 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################17] Polling begin
2024-03-21 10:58:25.714 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################17] TX(http) GET Appliance.System.All (messageId:3549fad1134c4950b939458fc31c0777)
2024-03-21 10:58:25.957 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################17] RX(http) GETACK Appliance.System.All (messageId:3549fad1134c4950b939458fc31c0777)
2024-03-21 10:58:25.959 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################17] Polling end
2024-03-21 10:58:26.498 INFO (MainThread) [custom_components.blueiris.api.blue_iris_api] Updating data from BI Server (BlueIris)
2024-03-21 10:58:27.174 DEBUG (MainThread) [custom_components.meross_lan.mss110nr_###############################3] Polling begin
2024-03-21 10:58:27.174 DEBUG (MainThread) [custom_components.meross_lan.mss110nr_###############################3] TX(http) GET Appliance.System.All (messageId:81b0b1430fa24f958fedee77e9752d7a)
2024-03-21 10:58:27.343 DEBUG (MainThread) [custom_components.meross_lan.mss110nr_###############################3] RX(http) GETACK Appliance.System.All (messageId:81b0b1430fa24f958fedee77e9752d7a)
2024-03-21 10:58:27.343 DEBUG (MainThread) [custom_components.meross_lan.mss110nr_###############################3] Polling end
2024-03-21 10:58:31.761 DEBUG (MainThread) [custom_components.meross_lan.msl450_###############################6] Polling begin
2024-03-21 10:58:31.763 DEBUG (MainThread) [custom_components.meross_lan.msl450_###############################6] TX(http) SET Appliance.Control.Multiple (messageId:ce2595ffc733468896d6d4dd0d261a3a)
2024-03-21 10:58:31.888 DEBUG (MainThread) [custom_components.meross_lan.msl450_###############################6] RX(http) SETACK Appliance.Control.Multiple (messageId:ce2595ffc733468896d6d4dd0d261a3a)
2024-03-21 10:58:31.889 DEBUG (MainThread) [custom_components.meross_lan.msl450_###############################6] Appliance.Control.Multiple requests=3 (responses=3) expected size=3879 (actual=4120)
2024-03-21 10:58:31.890 DEBUG (MainThread) [custom_components.meross_lan.msl450_###############################6] Polling end
2024-03-21 10:58:33.510 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################0] Polling begin
2024-03-21 10:58:33.513 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################0] TX(http) SET Appliance.Control.Multiple (messageId:5652182a007948fcb10b549a41fcc0b6)
2024-03-21 10:58:33.718 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################0] RX(http) SETACK Appliance.Control.Multiple (messageId:5652182a007948fcb10b549a41fcc0b6)
2024-03-21 10:58:33.718 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################0] Appliance.Control.Multiple requests=2 (responses=2) expected size=1879 (actual=1963)
2024-03-21 10:58:33.719 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################0] Polling end
2024-03-21 10:58:34.547 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################1] Polling begin
2024-03-21 10:58:34.552 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################1] TX(http) SET Appliance.Control.Multiple (messageId:36c0768b7d1e44a18accef9a5fb380a5)
2024-03-21 10:58:34.774 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################1] RX(http) SETACK Appliance.Control.Multiple (messageId:36c0768b7d1e44a18accef9a5fb380a5)
2024-03-21 10:58:34.774 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################1] Appliance.Control.Multiple requests=2 (responses=2) expected size=1879 (actual=1963)
2024-03-21 10:58:34.774 DEBUG (MainThread) [custom_components.meross_lan.mss210_###############################1] Polling end
2024-03-21 10:58:35.551 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################12] Polling begin
2024-03-21 10:58:35.551 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################12] TX(http) GET Appliance.System.All (messageId:5ccd84de98ac472db70a8898e740bb33)
2024-03-21 10:58:35.622 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################12] RX(http) GETACK Appliance.System.All (messageId:5ccd84de98ac472db70a8898e740bb33)
2024-03-21 10:58:35.623 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################12] Polling end
2024-03-21 10:58:41.122 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] Polling begin
2024-03-21 10:58:41.124 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] TX(http) SET Appliance.Control.Multiple (messageId:23469f0b093246b9a6690d64de1db4fd)
2024-03-21 10:58:41.160 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] RX(http) SETACK Appliance.Control.Multiple (messageId:23469f0b093246b9a6690d64de1db4fd)
2024-03-21 10:58:41.161 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] Appliance.Control.Multiple requests=2 (responses=2) expected size=2119 (actual=2119)
2024-03-21 10:58:41.162 DEBUG (MainThread) [custom_components.meross_lan.mss425f_##############################14] Polling end
2024-03-21 10:58:44.076 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] Polling begin
2024-03-21 10:58:44.076 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] TX(http) GET Appliance.System.All (messageId:9f5ac93bbb9a4826acf1d7aec809312f)
2024-03-21 10:58:44.246 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] RX(http) GETACK Appliance.System.All (messageId:9f5ac93bbb9a4826acf1d7aec809312f)
2024-03-21 10:58:44.247 DEBUG (MainThread) [custom_components.meross_lan.mss110_###############################9] Polling end
2024-03-21 10:58:52.206 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################10] Polling begin
2024-03-21 10:58:52.208 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################10] TX(http) GET Appliance.System.All (messageId:0f929c3c077f4c269d3d574e85513d24)
2024-03-21 10:58:52.655 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################10] RX(http) GETACK Appliance.System.All (messageId:0f929c3c077f4c269d3d574e85513d24)
2024-03-21 10:58:52.655 DEBUG (MainThread) [custom_components.meross_lan.mss110_##############################10] Polling end
2024-03-21 10:58:52.692 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling begin
2024-03-21 10:58:52.692 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Going offline!
2024-03-21 10:58:52.694 DEBUG (MainThread) [custom_components.meross_lan.mss315_##############################15] Polling end
krahabb commented 5 months ago

A lot of things to go through... ;) There are no logs or indications of the 'inner' retries..they're just silently done. You can infer they went through from the delay between the logged request (TX) and the logged error. It works like:

This is how it worked before the single connection x host implementation. Now, it is the same, except I didn't test what the library is actually doing when the code re-issues the connection attempt 'back-to-back' when the previous connection times out.

I think you could infer if this is working (i.e. if it is able to recover a connection that would have failed) by checking the delay between every TX->RX pair: -if it's more than roughly 1 sec but ends in success then the timeout likely hit but was recovered by the algorithm. -Also, if you get a timeout error it should in general be logged around 7 seconds after the TX request

As for why the logic for the 'declaration of being offline' appears to be a bit illogical, I guess you're right, I don't like it very much either but there are a lot of adjustments and 'euristics' that led to this.. One for all: when any HTTP connection fails....I'm not really sure the device is offline and the issues we're trying to fix clearly state that it might be only a transient condition (or malformed query or whatever) that recovers in few seconds. That's why the offlining is only done in the 'next' polling cycle. It is not the only logic though since in the beginning, the component was more MQTT focused and the implementation was lacking the ability to sync message responses so the only way to check online state was to see if any request was not replied since the previous poll (or something like that) Now, the MQTT part has gained the ability to await/sync for message replies so this kind of lazy timeout evaluation could be reviewed with something more straightforward logics.

Somehow related to this, there's also the fact that still I don't have a clear understanding of the different error messages that could be reported (talking about the HTTP connection). Sometimes the library reports a ServerDisconnected (and now thanks to your insights we have identified the reason for this) sometimes we have timeouts, sometimes we have other sorts of errors ('Unable to connect' for example)

So, yes, it's a bit of a mess: the result of the stratification of the different levels of implementations (through the years) of the 2 transports in meross_lan but still, whenever I've tried to tidy up that code, some sort of mis-behavior appeared so I'm still reluctant to re-implement it in a different way. It is not perfect but somehow its euristics are robust enough to make it reliable (at least in general)

One more thought: even if unlikely (I'm 99% confident it is not related) could it be that having those continuos pings stresses the devices IP stack at the point they're disturbing the tcp connections? icmp packets should really not stress the device tcp stack but since these iot microcontrollers are most of the times 'cheaply engineerized' I would not be surprised if their implementation is so fragile that even a plain icmp packet (not a single one maybe, but the repetition over time) could lower the reliability/responsiveness of the http/tcp stack

If I understand you have these track/ping requests being constantly issued to monitor the device (or maybe you've just set them while trying to investigate the disconnections)

krahabb commented 5 months ago

Forgot to mention: the fact that at the beginning of the next polling it just goes offline is very odd though...I'm surely going to check (and try to remember why it's so dumb...) I think it is the 'fossil' of the algorithm mainly devoted to detect MQTT offlining

timnolte commented 5 months ago

@krahabb OK, so with all of this talk of changes to polling and how things are working, and no real definitive proof that all of these changes actually fix issues, can I actually be safe with upgrading the integration on my HA instance? I'm currently on v4.5.3 and everything is working solid for me. The last thing I want to do is upgrade and have all of these changes start to cause me issues.

krahabb commented 5 months ago

Hello @timnolte, I'm sorry we have mixed-in an issue which I should have put on a separate thread since the beginning. As for the actual talks and implementations I think we're on the safe side and this is (like most of the other issues) really helping identifying and resolving bugs.

This is likely a very subtle one so that in normal circumstances almost no disconnections are happening (even though I remember from time to time the disconnection issue arises with no clear solution)

I'd say v5.0.2 (pre HTTP patch) and v5.0.3 (HTTP patch which is likely working better than before) are both trustable/reliable. If anything could break your setup I'd say it would likely not be related to HTTP ;)

krahabb commented 5 months ago

@gary-sargent,

I think I've identified another issue/explanation for some timeout now: Your device 4 timeouts after (around) 6 seconds on the 'Multiple' command request. This puzzled me a bit before going to bed and this morning I have an insight: if the algorithm for the timeout is correct and the device doesn't reply at all, the logged error should be in 1+2+4 seconds (7) as reported in previous comment. Actually the logic is a bit complicated since the http connection library has timeouts for various phases of the transactions (things are never so simple in real life..) There's a timeout for http session establishment ('connection timeout') and a timeout for full transaction (i.e. from request to complete message exchange) meross_lan sets the 'total timeout' at 5 seconds and this is never touched. The algorithm for retrying with increasing timeout only works on the 'connection timeout' (i.e. if the connection is not established it retries with a longer connection timeout). So now, what happens in that trace is:

This is strictly related to the nature of this 'Multiple' request. The same happens on MQTT (no reply at all after 5 seconds) and I've seen my devices too behaving like this when the response is too big for some internal buffer limits. Now, when I was testing this new message structure I saw the device replying ServerDisconnected and so this was handled like an indication that the message request was too 'large' for the device to be fullfilled. There's a tuning algorithm in meross_lan in order to detect and correct this issue automatically (#399) so this kind of failures could/should happen at start (a couple or few more 'failed attempts' until the code estimates the correct device buffer size in order to not let it overflow)

Now, the new connection algorithm exposes that, when this failure happens, the device doesn't really disconnect (the ServerDisconnected error I saw before was in fact a consequence of the device stalling and meross_lan trying to recover by immediately issuing a concurrent connection), it just timeouts.

In the end, the code now, just recovers correctly since, after getting the timeout for the 'Multiple' request, the auto-tuning splits the 'Multiple' in the 2 inner messages and carries on (you see the requests for 'All' and 'Runtime' back to back after the timeout failure. Going on (with no restarts) you should see less of these failures associated to this message.

Anyway, some delays/timeouts are still present in your setup: that transaction for instance exposed an initial 'connection timeout' of at least 1 second in the first attempt. Also, device 11 (or 15) timeout is 'genuine' since it is associated with a simple request for 'DND' and hits the overall connection timeout of 7 seconds i.e. 3 successive back-to-back attempts don't get connected at all

gary-sargent commented 5 months ago

@krahabb certainly 5.0.3 and https://github.com/krahabb/meross_lan/commit/333e816f8b6aa88465297a5ef2f8833fbd85fc6b seem to go a long way in alleviating the issue. I had 2 or 3 plugs go offline just the once overnight, but this is massively better than where this was a week ago. In a day I'll be away from home for just under two weeks, so I won't be able to contribute much until after then.