krahabb / meross_lan

Home Assistant integration for Meross devices
MIT License
434 stars 47 forks source link

Garage doors get stuck in "opening" state, sometimes with "detected problem" #498

Open neilenns opened 2 weeks ago

neilenns commented 2 weeks ago

Version of the custom_component

5.3.1

Configuration

meross_lan-5633e6531767d3e71b0746816ce953bd-Smart Garage Door (2008277429375190826748e1e92db86d)-03ef39005afa3a08f06eff6928e167d6 (1).json

Describe the bug

For the last month or so my two garage doors, which have worked fine for more than a year, are getting stuck in the "opening" state in Home Assistant. Sometimes the door will report "detected problem" in the log book, sometimes it reports "Became unavailable", and sometimes it just shows "opening" forever.

The Meross app on my phone always shows the correct state.

Debug log

Diagnostic log of opening the garage door. The log book showed the following entries for this and got stuck in the "opening" state:

image

meross_lan-5633e6531767d3e71b0746816ce953bd-Smart Garage Door (2008277429375190826748e1e92db86d)-03ef39005afa3a08f06eff6928e167d6 (3).json

I think I captured this correctly, if not please let me know.

The Home Assistant logs page shows the following error in Home Assistant Core logs:

Logger: custom_components.meross_lan.msg200_###############################0
Source: custom_components/meross_lan/helpers/__init__.py:273
integration: Meross LAN ([documentation](https://github.com/krahabb/meross_lan), [issues](https://github.com/krahabb/meross_lan/issues))
First occurred: 10:40:43 AM (1 occurrences)
Last logged: 10:40:43 AM

KeyError('state') in NamespaceHandler(Appliance.GarageDoor.State)._handle_dict: payload={}
neilenns commented 1 week ago

I tried this just now with the 5.4.0 release and it is still an issue. I get Detected problem once the door starts rolling up.

krahabb commented 1 week ago

Hello @neilenns, I see from your trace that the device replies in some 'unexpected' ways....I have other msg200 traces which are different expecially on garage state reporting and so I've tried coming up with a patch trying to address this. The fact is the patch is just another 'guess' about how to properly query your device. I see it is a relatively old version one and so things might have changed with most recent releases (where the other traces come from).

Nevertheless, if you could have a try at https://github.com/krahabb/meross_lan/releases/tag/v5.4.1-alpha.0 we could 'move forward' with resolution.

neilenns commented 1 week ago

Thanks, I'll give it a go. Should I just uninstall the HACS version I have and do a manual install of the alpha?

neilenns commented 1 week ago

Ok, I manually installed the new version. Unfortunately, it didn't seem to fix it:

Smart Garage Door Opener Problem 1 detected problem

And it got stuck opening again.

Let me know if I can provide other log info or some other traces to help track this down.

krahabb commented 1 week ago

Hello @neilenns, Thank you for trying, You can install the 'pre-releases' through HACS too without going for a manual install. You just have to enable 'beta repository' somewhere in HACS (I always forget how to do that..especially now that it got a big update) so that it also shows (and allow to install) you those pre-release versions.

As for what's next, we should get a 'diagnostic trace' which is different from the 'download diagnostics' you get from the HA core device UI. A 'diagnostic trace' is a continuous dump of messages exchanged over a period of time between meross_lan and the device itself. You can start this tracing by entering your device CONFIGURE -> Diagnostics. Here, just check the Start diagnostics trace, adjust the tracing duration to allow for a manageable duration (say 5 minutes) and , once you hit the SUBMIT button the software will start recording all the message exchanges for this particular device and save this to a file under your custom_components/meross_lan/traces folder. Now, while the tracing 'runs' operate your garage door through HA so that we can collect data for the whole duration of the transitions (at least 1 open / 1 close). If you can, do this for every active garage door channel.

Then upload it and I'll try better understand what's missing.

neilenns commented 1 week ago

Trace is attached, showing opening and closing one of the two garage doors. Home Assistant reported a problem on this run too, so hopefully the trace includes the message that caused it. 2024-10-15_06-46-15_01JA7ZX3WBA0MT3SV88Y1ZTHW8.csv

neilenns commented 1 week ago

Definitely not an expert, but looking at the logs, is it possible there's a discrepancy between channels for some reason?

2024/10/15 - 06:46:20   TX  http    GET Appliance.GarageDoor.State  {"state":[{"channel":1}]}
2024/10/15 - 06:46:20   RX  http    GETACK  Appliance.GarageDoor.State  {"state":{"channel":0,"doorEnable":1,"open":1,"lmTime":0}}

Looks like a request for the state of channel 1 is getting sent, but the opener is responding with channel 0.

This is different from when the command is sent to open/close the door, where the channels match:

2024/10/15 - 06:46:19   TX  http    SET Appliance.GarageDoor.State  {"state":{"channel":1,"open":1}}
2024/10/15 - 06:46:19   RX  http    SETACK  Appliance.GarageDoor.State  {"state":{"channel":1,"doorEnable":1,"open":0,"lmTime":1728992906,"execute":1}}
krahabb commented 1 week ago

Yep..that's the issue I was trying to tame down... Previously the code was just requesting a 'generic' Appliance.GarageDoor.State (with no channel indication) and this was usually replied with all of the channels states so it always contained the right channel data (which were subsequently extracted and parsed). At least this is the behavior shown in other firmwares/hardwares for the msg200. From your initial diagnostic I saw this 'generic' request was instead replied with that 'channel: 0' which smells a lot and so I've tried to refine the query to indicate the correct channel index but this is failing too....

These Meross devices usually have some basic 'rules' in message structure and that's why the code generally works for very different devices and also for newer ones (to some degree...) but these rules are not set in stone of course and there are exceptions (that need to be addressed) here and there.

Now, among these exceptions I've seen so far there are other possible query formats that we could try but before going further I'd ask you to do another test if you're up (if not, no issue, I can patch the code almost instantly and release that so you can just download and try)

The test basically works as this:

Let me know if this works or if you prefer I'd implement that directly for a new pre-release to test with.

neilenns commented 1 week ago

Here's the response:

request:
  header:
    messageId: 0221527114ed473d85b9a98f4ef0e973
    namespace: Appliance.GarageDoor.State
    method: GET
    payloadVersion: 1
    from: /appliance/meross_lan/publish
    timestamp: 1729005664
    timestampMs: 0
    sign: a6ea18d7c4169809380e2379385244f9
  payload:
    state:
      channel: 1
response:
  header:
    messageId: 0221527114ed473d85b9a98f4ef0e973
    namespace: Appliance.GarageDoor.State
    method: GETACK
    payloadVersion: 1
    from: /appliance/2008277429375190826748e1e92db86d/publish
    timestamp: 1729005663
    timestampMs: 963
    sign: 733b6a1a0ad113e7f8206ff897144371
  payload:
    state:
      channel: 1
      doorEnable: 1
      open: 0
      lmTime: 1729000459
neilenns commented 1 week ago

(Side note: I did dig around and there's no updated firmware available for my msg200)

krahabb commented 1 week ago

(Side note: I did dig around and there's no updated firmware available for my msg200)

Yeah..I suppose that..

Anyway your test looks promising: the request format looks like working now so we got the correct syntax for that. I'll update the pre-release and publish it asap

neilenns commented 1 week ago

Testing it now!

neilenns commented 1 week ago

Well it half works. Opening seemed to switch to open correctly, and no errors were shown. Closing however resulted in the dreaded detected problem and got stuck in the closing state. I tried twice and got the same result each time.

Here's a trace of open/close with the new change: 2024-10-15_10-01-43_01JA7ZX3WBA0MT3SV88Y1ZTHW8.csv

neilenns commented 1 week ago

Something I noticed in the logs, and in playing around with manually sending commands via the developer tools:

It looks like there's only four attempts to get the state of the door while it is closing before the add-on stops. Maybe it just needs to poll a few more times?

Edit: I tried increasing the polling to 60 seconds but that didn't make a difference.

neilenns commented 1 week ago

For completeness: once the door is fully closed if I request the state via the developer tools using {"state":{"channel":1}} I get this response:

request:
  header:
    messageId: f075760e374c4a32a847b01d3c7b0867
    namespace: Appliance.GarageDoor.State
    method: GET
    payloadVersion: 1
    from: /appliance/meross_lan/publish
    timestamp: 1729012930
    timestampMs: 0
    sign: 606a597b7d06f9364d2f30093a8b8c6d
  payload:
    state:
      channel: 1
response:
  header:
    messageId: f075760e374c4a32a847b01d3c7b0867
    namespace: Appliance.GarageDoor.State
    method: GETACK
    payloadVersion: 1
    from: /appliance/2008277429375190826748e1e92db86d/publish
    timestamp: 1729012930
    timestampMs: 508
    sign: 606a597b7d06f9364d2f30093a8b8c6d
  payload:
    state:
      channel: 1
      doorEnable: 1
      open: 0
      lmTime: 1729012165
krahabb commented 6 days ago

@neilenns,

Edit: I tried increasing the polling to 60 seconds but that didn't make a difference. I guess you've tried the general polling period in device configuration but that doesn't control how the transitions are tracked.

Despite that setting, when you start a transition from HA, meross_lan starts querying the device once a second (maybe less but..) in order to detect when it finally changes state. The different behaviors of the device (switching immediately state when opening while delaying it until the end when closing) are expected and managed. This 'transition tracking', neverthelss, is carried out over a defined period according to the device configured opening/closing times. If those are too short (expecially for the close transition), the tracking code will stop before the device physically matching the end (signaled by the hardware contact) and that's why meross_lan signals the 'problem'. You'd have to set the closing duration (even the opening one but that's less critical since the device switches state almost immediately) to a proper value (slightly in excess) according to how much it takes for the door to actually close.

neilenns commented 6 days ago

Ah! So that's what the mysterious signalClose1 and signalClose2 settings are for. I changed them to 15 seconds and everything reports correctly now. Amazing!

So looks like the one code change for the format of the message is good. Yay!