krahabb / meross_lan

Home Assistant integration for Meross devices
MIT License
450 stars 47 forks source link

mss110 unresponsive #430

Closed krahabb closed 5 months ago

krahabb commented 6 months ago
          Thanks @krahabb, I guess so.  No problems with any other Meross devices, only mss110 - all 16 of them. They all work fine in Apple Homekit but all my automations are managed in HASS so this is a pain.  Happy to provide some log data if you might have any suggestions...?

_Originally posted by @psteemson in https://github.com/krahabb/meross_lan/issues/427#issuecomment-2093232318_

psteemson commented 6 months ago

Small portion of debug log during which I tried turning on and off one single ms110 device is attached. It worked on the second attempt.

home-assistant_meross_lan_2024-05-03T15-10-20.614Z.log

HASS running on VM

HAOS: v 12.2 HASS Core: v 2024.5.1 Supervisor: v 2024.04.4

Meross LAN: v 5.0.4

All mss110: Hardware v 7.0.0 Firmware v 7.3.37

krahabb commented 6 months ago

It looks as if the mss110 have big issues when connected over HTTP.. meross_lan will then switchover to mqtt but the rate limiting is kicking-in hardly since you have a lot of devices all sharing the same rate limit of 1 message every 12 seconds (very strict tho)

We should understand why HTTP doesn't work anymore..the log reports ServerDisconnected and this 'statistically' happens when the device is alive but unable to fullfill the request (it happens for instance when it receives multiple simultaneous HTTP connections though)

You could try for instance forcing the protocol to HTTP for one of the devices and see if it completely fails offline or not (ensure the ip address is correct thought ;)

also, when debugging, you could raise the level to debug for a single device under investigation (not the whole integration then, like it seems from your log) by entering the 'Diagnostic' setting panel when trying to (re)configure the device. Here you should set the debug level so that it applies only to the logs for this entry and not other ones, so avoiding the log cluttering happening with all devices logging debug info.

psteemson commented 6 months ago

Thank you for taking a look. Sorry if log was cluttered. I'm new to this!

I think this must be down to a firmware update that happened recently. It applied only to mss110 devices and all problems have occurred since then. As I've said before, no other devices are affected so I assume they're all working fine over http. I figured the mss110 devices were failing on http and then reverting to cloud MQTT, but couldn't understand why the delay was soooooo long. Didn't realise there was a rate limit. I have 36 Meross devices overall (16 x mss110) so if they are all sharing a rate-limited broker then that's going to be a problem.

I've tried 'forcing' the use of http in device configs but then they just stop working altogether - well, the mss110 stop working. All other devices continue to work whether I force http or leave on Auto. Unless this gets sorted with a future firmware update then all my mss110 devices will be for ebay and it will be no more Meross devices for me! I'll find another make and model.

Thanks again. :)

krahabb commented 6 months ago

Good news is the rate-limiter is getting more lenient applying the aforementioned rate-limit x device instead of the whole connection (it looks in recent upgrades of Meross cloud infrastructure all of the devices of a single account connect to the same broker while they used to be a bit sparse over different brokers in the past - with this scenario the rate-limit actually implemented is very very strict)

The problem with rate-limiting is that there's no public sharing of this by Meross so it's almost all 'guess-work'...I've based my knowledge off what @albertogeniola did in its work and somewhere it shares its knowledge of this rate-limiting being about 1 message every 10 seconds 'on average' x device (He reports Meross tech support shared this vague info with him).

Based off this vague notion I've implemented the rate-limiter in a very strict way so that the limit is x connection (so all the devices sharing the same broker also share the rate-limit) and it is not averaged so messages are really 'pumped-out' at no more than 1 message in 10 (12 to be safe) seconds.

Now, with the incoming release this will be computed x device and averaged over 1 minute so that you could also have some bursts without incurring in meross_lan rate-limiting.

I hope this will not lead to bans though...

psteemson commented 6 months ago

I accept your point about the rate limiting. It is problematic for people with many devices on a single account (like me). But the real issue for me is why the mss110 devices refuse to work using http locally when all the other devices I have do so successfully. If the mss110 worked over http then there would be no need to revert to the Meross MTQQ broker and I wouldn't have a problem with rate limiting. I just hope that future firmware updates on other devices do not also render them unable to use http. Then all my Meross devices will be up for sale!

psteemson commented 6 months ago

But now I think about it, if Homekit works flawlessly (with no latency) when I'm at home it can't be using Meross cloud MQTT broker (otherwise would be subject to the same rate limiting, right?). So this suggests that the problem is with HASS and not the Meross devices specifically. But then I can't escape the fact that all other Meross devices work flawlessly and it is only the mss110 that have the problem. Either way, they are no longer usable from HASS. :(

krahabb commented 6 months ago

Homekit works locally (no mqtt - no cloud - just another local transport) and it might be reasonable, if the fw update bringed a bug, that this is only in the device http stack and no other stacks are affected.

krahabb commented 6 months ago

I'm trying to rush a preview release so that with less rate-limiting it could lower your issues in the meantime..but it's taking longer than forecasted :(

patienttruth commented 6 months ago

I just noticed one of my mss110 went unresponsive last night before I went to bed. It would not reconnect, despite showing up on the router as connected. After playing with the WiFi settings for a few minutes I got the idea to roll core back to 2024.4.4. it has been connecting since. Three of my other mss110 plugs have 1-3 30 second periods in the past 6 hours where they went unresponsive.

So, although the core rollback seems to have helped, it has not eliminated my problem.

Of note, I also had a mysa thermostat on a home kit connection that had disappeared last night. It's back now. Kinda odd since I have 6 of them, and only one wasn't showing up. Also, it seems like my kasa light bulbs which are somewhat finicky were worse off the past few days since I'd updated from 2024.4.4 to 2024.5.1.

Perhaps something changed on core that is exposing an underlying issue.

psteemson commented 6 months ago

I'm trying to rush a preview release so that with less rate-limiting it could lower your issues in the meantime..but it's taking longer than forecasted :(

Good luck with the new preview release. As you say, hopefully it will help - but the real problem remains the issue that mss110 devices seem to be having with http. If they always have to use MQTT over cloud then they're never going to be as responsive as they should be in automations etc. That's obviously not your fault - between HA Core update and latest Meross firmware something seems to have gone wrong with one or both of them.

Cully81 commented 6 months ago

I recently started having the same problems with a

mrs100 2.0.0 Firmware: 2.1.4 Hardware: 2.0.0

If I restart the Roller Shutter, access works for a few minutes, then the MRS is no longer available again.

Cully81 commented 6 months ago

Small update, it takes exactly 10 minutes, then the shutter is offline again. I have attached the debug logs.

log_mrs100.txt

krahabb commented 6 months ago

@Cully81, thank you for posting: I see the issue is the fact that after some initial polling cycles the device stops responding like if it's going 'nuts'... Did you recently updated fw for the device?

krahabb commented 6 months ago

@Cully81, The new 5.1.0 has some care for 'legacy' mrs100 and tries to disable some advanced features of the protocol which might be causing the device to stall...not sure though but maybe this will fix this.

patienttruth commented 6 months ago

As of a couple days ago I solved my mss110 problem by requesting meross roll my firmware back to 2.1.21

I noticed that one of my plugs wasn't having issues, and sure enough it was on the older firmware. They were accommodating, it took a few days for it to go through and I had to expose them to the internet for that time.

psteemson commented 6 months ago

Excellent @patienttruth. Rather proves that it is the firmware update that caused the issue in the first place. The polling limit per account or per device makes it worse but for http local control the device shouldn't be querying the cloud anyway.

The Alternative workaround is to use the Home Assistant HomeKit Device integration (for the HomeKit compatible plugs of course). You add the device to HomeKit in the usual way, then remove the device from HomeKit and it will magically appear in Home Assistant. You just have to type in the pairing code and then you're up and running. Home Assistant uses the HomeKit controller and you have totally local control - no need for Meross cloud or the Meross app. Presumably you lose any ability to update firmware but if you do have the Meross App then you could still use that to update the firmware (albeit doing that is what created my problem in the first place!).

To add back into HomeKit from HASS you can use the HomeKit Bridge integration to expose the device back to HomeKit. A bit circular but it works and removes the whole Meross Cloud snaffu thing. Totally reliable operation and no noticeable latency.

Guenni75 commented 5 months ago

Hi. MSS210 seems to have the same problem. I can only use them with the Meross App, but Home Assistant shows the correct state.

krahabb commented 5 months ago

@Guenni75, If that is the case the issue might be the same as in #456 and is currently fixed on 'dev' branch. It'll take some time (few days maybe) before I'm able to at least publish a pre-release

Guenni75 commented 5 months ago

Don't hurry. The 5.3.0 alpha 1 solves my problems. Great work!!