krahabb / meross_lan

Home Assistant integration for Meross devices
MIT License
450 stars 47 forks source link

Appliance.Control.Multiple failed with no response #399

Closed PtiBed0 closed 8 months ago

PtiBed0 commented 8 months ago

Hello, After the upgrade to last version I received the following error for all mss310 devices:

Cette erreur provient d'une intégration personnalisée

Logger: custom_components.merosslan.mss310###############################3 Source: custom_components/meross_lan/helpers/init.py:246 Integration: Meross LAN (documentation, issues) First occurred: 09:04:59 (1 occurrences) Last logged: 09:04:59

Appliance.Control.Multiple failed with no response: requests=2 expected size=3526

Thanks

figorr commented 8 months ago

Similar issue. ~~ Este error se originó a partir de una integración personalizada.

Logger: custom_components.merosslan.mss425f###############################1 Source: custom_components/meross_lan/helpers/init.py:246 Integration: Meross LAN (documentation, issues) First occurred: 08:03:01 (1 occurrences) Last logged: 08:03:01

Appliance.Control.Multiple failed with no response: requests=2 expected size=2089 ~~

krahabb commented 8 months ago

Yep, it's not an "error" it is just a "warning" saying something isn't working as expected but it could happen (it is kind of like the device is unreachable and you get a warning for that)

By design, it is expected to happen particularly on older devices firmware right after initial loading since the integration expects this to work in general but, as stated, some older fw might be more prone to exposing this failure. When this happens, meross_lan adjusts itself to try overcome these failures and so the log should not appear thereafter (or not so frequently)

If you can share the device type and hw/fw version for devices experiencing the warning, I could better make up my mind about the behavior of different devices and maybe better tune this

figorr commented 8 months ago

In mi case I have a couple of "mss425f". Both with firmware version 3.1.2 and hardware version 3.0.0.

One is very close to the main router and the other one is close to an Aimesh node. Both show signal strength 100%.

PtiBed0 commented 8 months ago

Pour moi, l'avertissement indiqué est uniquement pour les mss310. J'ai également 2 msg100 pour lequels il ny a pas d'avertissement. Voici la configuration des mss310 mss310 6.0.0 Micrologiciel : 6.3.22 Matériel : 6.0.0

RedEarth1 commented 8 months ago

I too am getting the same warning every time I reload the integration.

Logger: custom_components.merosslan.mss310###############################2 Source: custom_components/meross_lan/helpers/init.py:246 Integration: Meross LAN (documentation, issues) First occurred: 18:57:44 (5 occurrences) Last logged: 19:00:00

Appliance.Control.Multiple failed with no response: requests=3 expected size=2271 Appliance.Control.Multiple failed with no response: requests=4 expected size=3290 Appliance.Control.Multiple failed with no response: requests=3 expected size=2218

MSS310 x 3 Firmware: 2.1.17 Hardware: 2.0.0

jamespreedy commented 8 months ago

Happy to put this into a separate thread if preferred. I get the same for my covers - MRS100.

mrs100 7.0.0 Firmware: 7.6.14 Hardware: 7.0.0



Logger: custom_components.meross_lan.mrs100_###############################4
Source: custom_components/meross_lan/helpers/__init__.py:246
Integration: Meross LAN (documentation, issues)
First occurred: 28 February 2024 at 4:21:35 pm (1 occurrences)
Last logged: 28 February 2024 at 4:21:35 pm

Appliance.Control.Multiple failed with no response: requests=2 expected size=960```
dabbler68 commented 8 months ago

I had slightly different warnings for all my MSS310 devices after installing the latest update to 5.0.1, Hardware 6.0.0. Firmware 6.3.22.

WARNING (MainThread) [custom_components.merosslan.mss310###############################7] Appliance.Control.Multiple failed with no response: requests=3 expected size=3888

After reboot they appear one minute later, then again exactly a minute later again. No more after that.

krahabb commented 8 months ago

That's expected: the 'Appliance.Control.Multiple' (support for which was just introduced in 5.0.0) carries multiple commands together but, since the data payload exchange could grow large, that could cause overflow or other issues in the device and that's why this log warning appears: the device (likely for an internal overflow) didn't respond or responded partially.

When this happens (beside the warning) the code is able to recover by automatically reissuing the single messages and that's why it is just reported as a warning..it could even likely better be logged as DEBUG since in terms of functionality the code is working ok. (but I preferred the warning in order to gather behavior informations from the 'field' :)

Also, when this happens, the code reduces its expectations about maximum payload length and that's why, after a while, this log should not appear anymore since, once the algorithm is tuned, everything should go smooth.

In my experience, older devices should 'fail' more (I have an old msl120 which is really strict and overflows with very little payload sizes but, again, after 1 or 2 logs, meross_lan is able to manage this 'tiny payload size')

Sadly, there's no way to tell exactly, for each device and fw, which is the limit, hence the 'auto-tuning' feature with the hassle of these warning logs.

Also, as an added complexity, the allowed maximum payload size might be different when the device is queried over HTTP or over MQTT. For the latter there might be even more issues (size limits at the broker or different internal buffer sizes in device fw) and so, some devices might behave more erratically (when configured to communicate on both protocols) than just logging a few warning at startup. This case too is anyway managed so that the messages are re-requested in case.

Given the explanation I'd like to keep receiving notes or updates about this behavior so that the component could be better tuned (one thing likely coming is to really 'demote' this message at the DEBUG level so to not scare anyone)

RedEarth1 commented 8 months ago

Thanks for the update, I think a debug entry would be better if the behaviour being exhibited here is a normal and expected part of this tuning process you describe. For 'normal' behaviour to generate a 'warning' that something has 'failed' is a little bit alarming I think ... 😉

RedEarth1 commented 8 months ago

I'm now getting this warning logged constantly once every minute for one of my three MSS310 devices, each entry shows a 'size' of 1898, so it doesn't even look like it's doing any kind of auto-tuning, it looks like it's stuck in a loop or something...

2024-03-08 21:18:17.699 WARNING (MainThread) [custom_components.meross_lan.mss310_###############################2] Appliance.Control.Multiple failed with no response: requests=2 expected size=1898 2024-03-08 21:19:17.870 WARNING (MainThread) [custom_components.meross_lan.mss310_###############################2] Appliance.Control.Multiple failed with no response: requests=2 expected size=1898 2024-03-08 21:20:18.069 WARNING (MainThread) [custom_components.meross_lan.mss310_###############################2] Appliance.Control.Multiple failed with no response: requests=2 expected size=1898

Reloading the integrations hasn't solved the problem, in fact it has now started repeating every minute for all 3 devices (I reloaded all 3 instances of the integration because I didn't know which one was generating the warnings). Even after a restart of HA, one of the devices is still logging repeating warnings.

EDIT: I've now rolled back to version 4.5.3 to avoid my HA logs getting filled with these warnings

krahabb commented 8 months ago

@RedEarth1 , I'm trying to release a fix s soon as possible for this 'flooding' but, in the meantime, trying to understand why this command started to consistently fail, could you tell me if your devices are working over HTTP or MQTT (meross cloud or local?)

RedEarth1 commented 8 months ago

@RedEarth1 , I'm trying to release a fix s soon as possible for this 'flooding' but, in the meantime, trying to understand why this command started to consistently fail, could you tell me if your devices are working over HTTP or MQTT (meross cloud or local?)

Hi, I believe HA is using HTTP to communicate with them because they're still accessible to the Meross cloud via the manufacturer's app.

dabbler68 commented 8 months ago

I'm not sure if this helps but for me it is only the MSS310 devices that generate these message, I also have MSL420, MSL430 lamps, MSG100 garage door controller and HP110AHK white noise lamp (Cherub). None of these generate messages.

Update: I found this message generated 55 minutes after rebooting ... a different content from the others. 2024-03-10 11:47:19.997 WARNING (MainThread) [custom_components.merosslan.msl430##############################10] Incorrect timestamp: -5 seconds behind HA

Before rebooting I reconnected 2 x MSL430 that had been disconnected for a month or so, I have five altogether, so not sure which one this was. All have the same hardware version 4.0.0 and firnware 4.2.6.

I don't get these messages continuing like RedEarth. One message per MSS310 one minute after rebooting, then one minute later only. Apart from the MSL430 with its single message 55 minutes after rebooting.

RedEarth1 commented 8 months ago

MSS310s are smart plugs with an energy monitor so I guess it makes sense that they would need generating more traffic than those other devices you mentioned

krahabb commented 8 months ago

@RedEarth1, yes that's true, the fact that the log is once every 1 minute is a further indication the issue arises (only) when the power/energy readings are queried. Not sure yet why it keeps repeating since the code is 'tuned' to take care of the bulk query for those power readings and by the look of it the expected payload size is in range with these expectations. Also, being it less than 2k, I don't expect the message request to fail since devices usually allows for 3-4 kbytes payloads. At any rate, the expected size is what meross_lan (me in turn :) expects while the real size generated at the device fw is another story. I would need a trace from those 'offending' devices but I understand you've already downgraded so no rush. I'm thinking about a different logging/reporting mechanism so to be able to still diagnose issues but without dumping tons of useless logs.

RedEarth1 commented 8 months ago

@krahabb Hi, thanks for your help, always happy to re-upgrade and grab some logs if it helps to solve the problem :-) 2024-03-10_13-19-07_1edd1236135baa65502c0052cb2f6155.csv hope this is useful to you...

dabbler68 commented 8 months ago

The message I reported earlier about timestamp is popping up at apparently random times. "WARNING (MainThread) [custom_components.merosslan.msl430##############################10] Incorrect timestamp: -5 seconds behind HA"

Now four of them a few hours apart. Does the number after the string of #### mean something? I have 10, 11, 1, 2.

I have not made any changes since my post last week but these warnings are new (other than plugging in a couple of these that had been disconnected for a few weeks).

Update: A few hours later and my MSL420 and HP110AHK have both reported the same 5 seconds time error for the first time. I don't know if I can reset the time so have just done a full hardware reboot.

krahabb commented 8 months ago

@dabbler68, that's a log there's a discrepancy between devices time and HA time. Since all of them (usually) gather their internal time from an NTP server, there's an issue somewhere in internet time synchronization. Having the log appeared for many devices, the issue might lie in HA itself (i.e. the machine running HA is not getting accurate NTP or, more likely, is not synchronizing at all and so slightly falling off) the number in the log reports the difference in seconds between the HA time and the device time: negative numbers mean HA time is falling behind.

RedEarth1 commented 8 months ago

I upgraded to 5.0.2, it's definitely looking better. Warnings are gone from the log, no errors for this integration, and all the functionality seems to be working OK.