home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.08k stars 29.74k forks source link

Xiaomi Aqara Gateway not receiving events #80249

Closed hanunes closed 1 year ago

hanunes commented 1 year ago

The problem

After upgrading to the new core version, aqara switches/temperature sensors/etc. don't send events or state updates to HA. The only device I see messages coming from the gateway is the Power Plug.

When I go to the Mi Home app, I see the trigger events getting logged, but they don't get passed to HA.

What version of Home Assistant Core has the issue?

core-2022.10.03

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

xiaomi_aqara

Link to integration documentation on our website

https://www.home-assistant.io/integrations/xiaomi_aqara/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2022-10-13 10:05:20.132 DEBUG (MainThread) [homeassistant.components.xiaomi_aqara] Gateway with host '10.40.0.18' connected, listening for broadcasts 2022-10-13 10:05:50.228 DEBUG (SyncWorker_8) [homeassistant.components.xiaomi_aqara.switch] Update data from hub: Plug_158d00015adfce 2022-10-13 10:05:50.234 DEBUG (SyncWorker_8) [homeassistant.components.xiaomi_aqara] PUSH >> : {'voltage': 3600, 'status': 'on', 'inuse': '1', 'power_consumed': '256227', 'load_power': '0.89'} 2022-10-13 10:06:20.229 DEBUG (SyncWorker_2) [homeassistant.components.xiaomi_aqara.switch] Update data from hub: Plug_158d00015adfce 2022-10-13 10:06:20.233 DEBUG (SyncWorker_2) [homeassistant.components.xiaomi_aqara] PUSH >> : {'voltage': 3600, 'status': 'on', 'inuse': '1', 'power_consumed': '256227', 'load_power': '0.89'} 2022-10-13 10:06:50.230 DEBUG (SyncWorker_7) [homeassistant.components.xiaomi_aqara.switch] Update data from hub: Plug_158d00015adfce 2022-10-13 10:06:50.236 DEBUG (SyncWorker_7) [homeassistant.components.xiaomi_aqara] PUSH >> : {'voltage': 3600, 'status': 'on', 'inuse': '1', 'power_consumed': '256227', 'load_power': '0.89'} 2022-10-13 10:07:20.231 DEBUG (SyncWorker_1) [homeassistant.components.xiaomi_aqara.switch] Update data from hub: Plug_158d00015adfce

Additional information

No response

syssi commented 1 year ago

Some steps to verify the multicast traffic arrives at the Home Assistant Core container.

  1. Enable SSH access on port 22222 by https://developers.home-assistant.io/docs/operating-system/debugging/
  2. Use a SSH client to login: ssh root@homeassistant.local -p 22222
  3. Switch into the HA core container: docker exec -it homeassistant bash
  4. Start tcpdump to dump the multicast traffic on port 9898: tcpdump -X udp port 9898

The traffic should look like this:

$ tcpdump -X udp port 9898           
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on enp0s3, link-type EN10MB (Ethernet), snapshot length 262144 bytes
17:49:38.665358 IP lumi-gateway-v3_miio56782504.fritz.box.4321 > 224.0.0.50.9898: UDP, length 103
        0x0000:  4500 0083 5f47 0000 ff11 b9c2 c0a8 0185  E..._G..........
        0x0010:  e000 0032 10e1 26aa 006f 6a23 7b22 636d  ...2..&..oj#{"cm
        0x0020:  6422 3a22 7265 706f 7274 222c 226d 6f64  d":"report","mod
        0x0030:  656c 223a 226d 6167 6e65 7422 2c22 7369  el":"magnet","si
        0x0040:  6422 3a22 3135 3864 3030 3031 3136 6632  d":"158d000116f2
        0x0050:  6262 222c 2273 686f 7274 5f69 6422 3a39  bb","short_id":9
        0x0060:  3138 342c 2264 6174 6122 3a22 7b5c 2273  184,"data":"{\"s
        0x0070:  7461 7475 735c 223a 5c22 6f70 656e 5c22  tatus\":\"open\"
        0x0080:  7d22 7d                                  }"}
17:49:43.049216 IP lumi-gateway-v3_miio56782504.fritz.box.4321 > 224.0.0.50.9898: UDP, length 136
        0x0000:  4500 00a4 5f4c 0000 ff11 b99c c0a8 0185  E..._L..........
        0x0010:  e000 0032 10e1 26aa 0090 01d2 7b22 636d  ...2..&.....{"cm
        0x0020:  6422 3a22 6865 6172 7462 6561 7422 2c22  d":"heartbeat","
        0x0030:  6d6f 6465 6c22 3a22 6761 7465 7761 7922  model":"gateway"
        0x0040:  2c22 7369 6422 3a22 3334 6365 3030 3838  ,"sid":"34ce0088
        0x0050:  3931 3439 222c 2273 686f 7274 5f69 6422  9149","short_id"
        0x0060:  3a22 3022 2c22 746f 6b65 6e22 3a22 3649  :"0","token":"6I
        0x0070:  7676 576f 6f6b 7353 5774 6536 5948 222c  vvWooksSWte6YH",
        0x0080:  2264 6174 6122 3a22 7b5c 2269 705c 223a  "data":"{\"ip\":
        0x0090:  5c22 3139 322e 3136 382e 312e 3133 335c  \"192.168.1.133\
        0x00a0:  227d 227d                                "}"}
17:49:43.666540 IP lumi-gateway-v3_miio56782504.fritz.box.4321 > 224.0.0.50.9898: UDP, length 104
        0x0000:  4500 0084 5f4d 0000 ff11 b9bb c0a8 0185  E..._M..........
        0x0010:  e000 0032 10e1 26aa 0070 180f 7b22 636d  ...2..&..p..{"cm
        0x0020:  6422 3a22 7265 706f 7274 222c 226d 6f64  d":"report","mod
        0x0030:  656c 223a 226d 6167 6e65 7422 2c22 7369  el":"magnet","si
        0x0040:  6422 3a22 3135 3864 3030 3031 3136 6632  d":"158d000116f2
        0x0050:  6262 222c 2273 686f 7274 5f69 6422 3a39  bb","short_id":9
        0x0060:  3138 342c 2264 6174 6122 3a22 7b5c 2273  184,"data":"{\"s
        0x0070:  7461 7475 735c 223a 5c22 636c 6f73 655c  tatus\":\"close\
        0x0080:  227d 227d                                "}"}
17:49:48.665451 IP lumi-gateway-v3_miio56782504.fritz.box.4321 > 224.0.0.50.9898: UDP, length 103
        0x0000:  4500 0083 5f51 0000 ff11 b9b8 c0a8 0185  E..._Q..........
        0x0010:  e000 0032 10e1 26aa 006f 6a23 7b22 636d  ...2..&..oj#{"cm
        0x0020:  6422 3a22 7265 706f 7274 222c 226d 6f64  d":"report","mod
        0x0030:  656c 223a 226d 6167 6e65 7422 2c22 7369  el":"magnet","si
        0x0040:  6422 3a22 3135 3864 3030 3031 3136 6632  d":"158d000116f2
        0x0050:  6262 222c 2273 686f 7274 5f69 6422 3a39  bb","short_id":9
        0x0060:  3138 342c 2264 6174 6122 3a22 7b5c 2273  184,"data":"{\"s
        0x0070:  7461 7475 735c 223a 5c22 6f70 656e 5c22  tatus\":\"open\"
        0x0080:  7d22 7d
syssi commented 1 year ago

I would be happy if affected people could provide the output of:

jq '.data .entries [] | select(.domain == "xiaomi_aqara")' /config/.storage/core.config_entries

The output should look like this:

$ jq '.data .entries [] | select(.domain == "xiaomi_aqara")' /config/.storage/core.config_entries
{
  "entry_id": "1feb1c8b30d0f5195bdac086fe414b95",
  "version": 1,
  "domain": "xiaomi_aqara",
  "title": "Xiaomi Aqara Gateway",
  "data": {
    "host": "192.168.1.133",
    "port": 9898,
    "mac": "34:ce:00:88:91:49",
    "interface": "any",
    "protocol": "1.1.2",
    "key": "c6zvab1ou1nlgxoq",
    "sid": "34ce00889149"
  },
  "options": {},
  "pref_disable_new_entities": false,
  "pref_disable_polling": false,
  "source": "zeroconf",
  "unique_id": "34:ce:00:88:91:49",
  "disabled_by": null
}

Please feel free to remove the mac address, sid, encryption key and host. The most important part is the interface, protocol version and source.

fastrax commented 1 year ago
{
        "entry_id": "e2abd4ec1c35011398ed850fee6b60df",
        "version": 1,
        "domain": "xiaomi_aqara",
        "title": "Xiaomi Aqara Gateway",
        "data": {
          "host": "192.168.0.178",
          "port": 9898,
          "mac": "REMOVED",
          "interface": "any",
          "protocol": "1.1.2",
          "key": "REMOVED",
          "sid": "REMOVED"
        },
        "options": {},
        "pref_disable_new_entities": false,
        "pref_disable_polling": false,
        "source": "user",
        "unique_id": "REMOVED",
        "disabled_by": null
      }
syssi commented 1 year ago

Let's use ngrep instead of tcpdump for further tests:

$ apk --no-cache --update --verbose add ngrep
$ ngrep port 9898 or port 4321
interface: enp0s3 (192.168.1.0/255.255.255.0)
filter: ( port 9898 or port 4321 ) and ((ip || ip6) || (vlan && (ip || ip6)))
#
U 192.168.1.133:4321 -> 224.0.0.50:9898 #1
  {"cmd":"report","model":"magnet","sid":"158d000116f2bb","short_id":9184,"data":"{\"status\":\"open\"}"}                                                           
#
U 192.168.1.133:4321 -> 224.0.0.50:9898 #2
  {"cmd":"heartbeat","model":"gateway","sid":"34ce00889149","short_id":"0","token":"gI8H3cjtF3AN4DkP","data":"{\"ip\":\"192.168.1.133\"}"}                          
#
U 192.168.1.133:4321 -> 224.0.0.50:9898 #3
  {"cmd":"report","model":"magnet","sid":"158d000116f2bb","short_id":9184,"data":"{\"status\":\"close\"}"}                                                          
#
U 192.168.1.133:4321 -> 224.0.0.50:9898 #4
  {"cmd":"report","model":"magnet","sid":"158d000116f2bb","short_id":9184,"data":"{\"status\":\"open\"}"}                                                           
#
U 192.168.1.133:4321 -> 224.0.0.50:9898 #5
  {"cmd":"heartbeat","model":"gateway","sid":"34ce00889149","short_id":"0","token":"mmssbyKbD7R5tV0G","data":"{\"ip\":\"192.168.1.133\"}"}

The output is more readable and easier to redact. @fastrax Please keep in mind the redacted values of your capture above are still there but hex encoded in the middle column.

syssi commented 1 year ago

@fastrax Your traffic capture above shows unicast traffic only (from a single host address to a single host address: 192.168.0.178 <-> 192.168.0.xxx). On HA startup all Xiaomi sub-devices gets polled once. The Xiaomi Aqara integration sends a read ({"cmd": "read", "sid": "identifier-of-the-subdevice"}) command to the gateway and the gateway responds per request with a read_ack plus the sensor value(s). All further updates should arrive without polling (-> PUSH) using multicast traffic.

My traffic capture shows the multicast traffic only:

U 192.168.1.133:4321 -> 224.0.0.50:9898 #3
  {"cmd":"report","model":"magnet","sid":"158d000116f2bb","short_id":9184,"data":"{\"status\":\"close\"}"}                                                          
#
U 192.168.1.133:4321 -> 224.0.0.50:9898 #4
  {"cmd":"report","model":"magnet","sid":"158d000116f2bb","short_id":9184,"data":"{\"status\":\"open\"}"}  

The gateway (192.168.1.133) publishes the sensor state ("status": "open") of a subdevice ("sid": "158d000116f2bb") as report to a multicast address 224.0.0.50. The HA core must be able to receive these multicast (push) messages to update the sensor state of the different entities on receive of new reports.

fastrax commented 1 year ago

Any reason why multicast push messages would stop?

I have done a fresh install using VDI method suggested above. I relinked the gateway to another router. Still no joy.

syssi commented 1 year ago

@fastrax I will downgrade the HA core (of the HA OS / VDI setup) today and will provide the steps how to do it. I would like to compare the traffic (between before and after).

fastrax commented 1 year ago

I have it fixed. I feel terrible.

The device that has my virtualbox was connecting to the wrong wifi. #worldsworstperson.

syssi commented 1 year ago

No problem! Good to know the root cause of your issue was another one. Are you able to receive the multicast packets now?

fastrax commented 1 year ago

Yes, all back up and running. Thank you for your support and patience. Hugely appreciated.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Sebastian Muszynski @.> Sent: Sunday, October 30, 2022 8:37:03 PM To: home-assistant/core @.> Cc: fastrax @.>; Mention @.> Subject: Re: [home-assistant/core] Xiaomi Aqara Gateway not receiving events (Issue #80249)

No problem! Good to know the root cause of your issue was another one. Are you able to receive the multicast packets now?

— Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhome-assistant%2Fcore%2Fissues%2F80249%23issuecomment-1296150865&data=05%7C01%7C%7Ca4c617fab5a54114ab6708daba498a6d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638027122253573302%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1IUECeTclplx5cwQ72k2ReeNvMjPO5blmEmOQQPNq50%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABLXYNSVEXJVTPEHARUTFELWFYJQ7ANCNFSM6AAAAAARED6SZQ&data=05%7C01%7C%7Ca4c617fab5a54114ab6708daba498a6d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638027122253729500%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mOASqduVnmlaz4rf95wDQB3zc%2FCITTL7%2FnKh9us1a5g%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

anxix commented 1 year ago

Called in 2022.09:

bash-5.1# jq '.data .entries [] | select(.domain == "xiaomi_aqara")' /config/.storage/core.config_entries
{
  "entry_id": "a75164ae135a43df8aa18cb87f567d70",
  "version": 1,
  "domain": "xiaomi_aqara",
  "title": "Xiaomi Aqara Gateway",
  "data": {
    "host": "192.168.X.Y",
    "interface": "192.168.X.Y",
    "key": "KEY",
    "mac": "MAC",
    "port": 9898,
    "protocol": "1.1.2",
    "sid": "SID"
  },
  "options": {},
  "pref_disable_new_entities": false,
  "pref_disable_polling": false,
  "source": "zeroconf",
  "unique_id": "UNIQUE",
  "disabled_by": null
}

On 2022.10:

{
  "entry_id": "a75164ae135a43df8aa18cb87f567d70",
  "version": 1,
  "domain": "xiaomi_aqara",
  "title": "Xiaomi Aqara Gateway",
  "data": {
    "host": "192.168.X.Y",
    "interface": "192.168.X.Y",
    "key": "KEY",
    "mac": "MAC",
    "port": 9898,
    "protocol": "1.1.2",
    "sid": "SID"
  },
  "options": {},
  "pref_disable_new_entities": false,
  "pref_disable_polling": false,
  "source": "zeroconf",
  "unique_id": "UNIQUE",
  "disabled_by": null
}

ngrep on 2022.10:

bash-5.1# ngrep port 9898 or port 4321
interface: eno1 (192.168.X.A/255.255.255.0)
filter: ( port 9898 or port 4321 ) and ((ip || ip6) || (vlan && (ip || ip6)))
#
U 192.168.X.Y:4321 -> 224.0.0.50:9898 #1
  {"cmd":"heartbeat","model":"gateway","sid":"aaaaaaaaaaaa","short_id":"0","token":"22z3HT06N05x8efz","data":"{\"
  ip\":\"192.168.X.Y\"}"}                                                                                       
#
U 192.168.X.Y:4321 -> 224.0.0.50:9898 #2
  {"cmd":"heartbeat","model":"gateway","sid":"aaaaaaaaaaaa","short_id":"0","token":"xnAILbBpMeQTuWyd","data":"{\"
  ip\":\"192.168.X.Y\"}"}                                                                                       
#
U 192.168.X.Y:4321 -> 224.0.0.50:9898 #3
  {"cmd":"heartbeat","model":"gateway","sid":"aaaaaaaaaaaa","short_id":"0","token":"JBMdJGf0wjU6eShD","data":"{\"
  ip\":\"192.168.X.Y\"}"}                                                                                       
#
U 192.168.X.Y:4321 -> 224.0.0.50:9898 #4
  {"cmd":"report","model":"sensor_motion.aq2","sid":"bbbbbbbbbbbbbb","short_id":9197,"data":"{\"lux\":\"407\"}"} 
#
U 192.168.X.Y:4321 -> 224.0.0.50:9898 #5
  {"cmd":"report","model":"sensor_motion.aq2","sid":"bbbbbbbbbbbbbb","short_id":9197,"data":"{\"status\":\"motion
  \"}"}                                                                                                          
#
U 192.168.X.Y:4321 -> 224.0.0.50:9898 #6
  {"cmd":"heartbeat","model":"gateway","sid":"aaaaaaaaaaaa","short_id":"0","token":"Glkg2X5QIxzjbQTT","data":"{\"
  ip\":\"192.168.X.Y\"}"}     

This is recorded while the sensors are working correctly. Currently 12 minutes uptime and sensors so far working, 2022.10.5. Presumably they will stop later.

syssi commented 1 year ago

@anxix So far everything looks healthy & correct. Feel free to remove the tcpdump output from your post.

anxix commented 1 year ago

They've gone offline again, 12 minutes ago. They worked for about 2.5 hours this time. I'll check the monitoring when I get home again.

anxix commented 1 year ago

The events appear to still be coming in, including motion detection:

U 192.168.X.Y:4321 -> 224.0.0.50:9898 #2761
  {"cmd":"report","model":"sensor_motion.aq2","sid":"bbbbbbbbbbbbbb","short_id":53077,"data":"{\"lux\":\"7\"}"}  
#
U 192.168.:4321 -> 224.0.0.50:9898 #2762
  {"cmd":"report","model":"sensor_motion.aq2","sid":"bbbbbbbbbbbbbb","short_id":53077,"data":"{\"status\":\"motio
  n\"}"}                                                                                                         
#
U 192.168.X.Y:4321 -> 224.0.0.50:9898 #2763
  {"cmd":"heartbeat","model":"gateway","sid":"aaaaaaaaaaa","short_id":"0","token":"i7DNBNu0CxP0s6a4","data":"{\"
  ip\":\"192.168.X.Y\"}"}                                                                                       
#
U 192.168.X.Y:4321 -> 224.0.0.50:9898 #2764
  {"cmd":"heartbeat","model":"gateway","sid":"aaaaaaaaaaa","short_id":"0","token":"r8JHCSyMEau8E9PN","data":"{\"
  ip\":\"192.168.X.Y\"}"} X.Y                 

Even so, the dashboard displays them as "Unavailable" and automations that are supposed to happen when motion is detected, are not executed. The logs in the HA UI don't show anything that looks in any way related to this. Going back to 2022.09 again.

syssi commented 1 year ago

@anxix This is interesting. Could you increase the log level of the xiaomi_aqara. See https://github.com/home-assistant/core/issues/80249#issuecomment-1279986663

The multicast traffic should become visible as log messages identified by the code word PUSH. Is the integration still able to decode the multicast messages? I guess yes but the state updates aren't applied to the sensor entities. Could you verify this assumption? Thanks in advance!

anxix commented 1 year ago

I've configured the logging, updated and will let you know what the logs say once it stops working again.

anxix commented 1 year ago

This is interesting. I get a message again 2.5h after update & restart that they're offline. Based on the log however, it seems like actions stopped way before:

<< This is after reboot>>
2022-10-31 13:26:29.409 INFO (SyncWorker_6) [xiaomi_gateway] Found 16 devices
2022-10-31 13:26:29.410 DEBUG (SyncWorker_6) [xiaomi_gateway] Discovery attempt 1/5
2022-10-31 13:26:29.410 DEBUG (SyncWorker_6) [xiaomi_gateway] _send_cmd >> b'{"cmd":"read","sid":"158daaaaaaaaaaaa"}'
2022-10-31 13:26:29.753 DEBUG (SyncWorker_6) [xiaomi_gateway] _send_cmd resp << {'cmd': 'read_ack', 'model': 'switch', 'sid': '158daaaaaaaaaaaa', 'short_id': 2476, 'data': '{"voltage":2762}'}

<<Snip to last logs>>

2022-10-31 14:11:00.364 DEBUG (SyncWorker_6) [homeassistant.components.xiaomi_aqara] PUSH >> <Entity Battery_aaaaaaaaad985: 39>: {'voltage': 2995, 'status': 'open'}
2022-10-31 14:11:00.365 DEBUG (SyncWorker_6) [homeassistant.components.xiaomi_aqara] PUSH >> <Entity Door Window Sensor_aaaaaaaaad985: on>: {'voltage': 2995, 'status': 'open'}
2022-10-31 14:11:30.361 DEBUG (SyncWorker_4) [homeassistant.components.xiaomi_aqara.binary_sensor] Updating xiaomi sensor (aaaaaaaaad985) by polling
2022-10-31 14:11:30.362 DEBUG (SyncWorker_4) [xiaomi_gateway] _send_cmd >> b'{ "cmd":"read","sid":"aaaaaaaaad985"}'
2022-10-31 14:11:30.366 DEBUG (SyncWorker_4) [xiaomi_gateway] _send_cmd resp << {'cmd': 'read_ack', 'model': 'magnet', 'sid': 'aaaaaaaaad985', 'short_id': 14507, 'data': '{"voltage":2995,"status":"close"}'}
2022-10-31 14:11:30.366 DEBUG (SyncWorker_4) [xiaomi_gateway] read_ack << {'cmd': 'read_ack', 'model': 'magnet', 'sid': 'aaaaaaaaad985', 'short_id': 14507, 'data': '{"voltage":2995,"status":"close"}'}
2022-10-31 14:11:30.367 DEBUG (SyncWorker_4) [homeassistant.components.xiaomi_aqara] PUSH >> <Entity Battery_aaaaaaaaad985: 39>: {'voltage': 2995, 'status': 'close'}
2022-10-31 14:11:30.367 DEBUG (SyncWorker_4) [homeassistant.components.xiaomi_aqara] PUSH >> <Entity Door Window Sensor_aaaaaaaaad985: on>: {'voltage': 2995, 'status': 'close'}
2022-10-31 16:01:30.254 WARNING (MainThread) [homeassistant.components.automation.notify_if_xiaomi_stuff_is_offline] Notify if Xiaomi stuff is offline: Already running

Note how the last message log message came it at 14:11, but the offline notifications came in at 16:01. There's a 10 minute delay between the items going offline, and the system sending out a notification they're offline - i.e. they must have been reported as offline at 15:51.

It looks like it works after reboot properly for about 45 minutes (there's a few DEBUG messages coming in every minute), then for another 1h40 it doesn't work, but it doesn't mark the components as not available (some type of grace period in HA?), and only after that it marks them as unavailable.

FWIW, ngrep still shows events coming in if I walk past a motion sensor for example:

#
U 192.168.X.Y:4321 -> 224.0.0.50:9898 #2
  {"cmd":"report","model":"sensor_motion.aq2","sid":"aaaaaaaaaa","short_id":9197,"data":"{\"status\":\"motion
  \"}"}                                                                                                          
syssi commented 1 year ago

I would like to share an important implementation detail so your findings make more sense. ;-)

https://github.com/home-assistant/core/blob/dev/homeassistant/components/xiaomi_aqara/__init__.py#L59

The xiaomi_aqara implementation attaches a timer to each sensor/entity. The timer starts a countdown with 150 minutes. Each time a sensor/entity gets updated (PUSH) the timer will be resetted to 150 minutes. In other words: HA entities will be reported as unavailable after 150 minutes of inactivity / with a delay of 150 minutes.

anxix commented 1 year ago

Oh, I should correct: the offline notifications are sent after 5 rather than 10 minutes of components reporting as unavailable.

That means they went offline at 15:56, which is exactly 2.5h after the boot time. That suggests combined with the 150 minute delay that the implementation internally thought they've timed out pretty much immediately after start, even though messages came in and the debug logging showed PUSH stuff coming in for another 45 minutes.

HA dashboard just can't be trusted on this. That makes sense, because my experience after the first update (which was in the evening) showed it being broken very early on, which was empirically noticeable because the motion activated lights wouldn't go on. Don't notice it during the day because of course the lights aren't set to motion activate during daylight, so I was relying on the dashboard.

syssi commented 1 year ago

It looks like resetting the timer on incoming messages doesn't work anymore. Conclusion: Your Aqara devices are reported as unavailable after 2,5 hours always(?). ngrep reports incoming messages and you can see these messages as PUSH log messages at the log. This message stream never stops but the entities are reported as unavailable nevertheless. Correct?

anxix commented 1 year ago

2.5h: I have two clear situations where they were marked unavailable after 2.5h. The first time it happened, I don't have the exact time, but it was definitely a few hours - 2.5h sounds plausible for that time too.

ngrep: yes, ngrep shows messages continues to come in even while they're shown as unavailable.

Log: PUSH messages in the log do stop at some point. Today they stopped after 45min. This while ngrep still showed them coming in.

anxix commented 1 year ago

Right, after looking at the logs, I can confirm the first time they were reported unavailable after exactly 2.5h. The server started at 23 sept 17:14, they went missing at 19:44. That gives three tests all reporting unavailable after 150 minutes.

syssi commented 1 year ago

Could you place a copy of the xiaomi_aqara component folder to your config/custom_components directory? This should override the official component. On startup of Home Assistant you should see a notice about the override / usage of the custom component. The folder structure should look like this:

$ find custom_components/xiaomi_aqara
custom_components/xiaomi_aqara
custom_components/xiaomi_aqara/light.py
custom_components/xiaomi_aqara/sensor.py
custom_components/xiaomi_aqara/__init__.py
custom_components/xiaomi_aqara/strings.json

Lets disable the unavailable tracking now by returning false always:

    @callback
    def _async_track_unavailable(self):
        return False

https://github.com/home-assistant/core/blob/dev/homeassistant/components/xiaomi_aqara/__init__.py#L340-L350

anxix commented 1 year ago

I've added the files in config:

bash-5.1# find custom_components/xiaomi_aqara/
custom_components/xiaomi_aqara/
custom_components/xiaomi_aqara/light.py
custom_components/xiaomi_aqara/sensor.py
custom_components/xiaomi_aqara/strings.json
custom_components/xiaomi_aqara/__init__.py

Changed the init.py:

    @callback
    def _async_track_unavailable(self):
        return False

upgraded to 2022.10.5 again. After server restart I didn't notice any notice about override, also not in the logs. How do I find out which version of the plugin it's using?

anxix commented 1 year ago

I've added a _LOGGER.debug("CUSTOM XIAOMI AQARA") call in init.py, setup method just before the return and it doesn't show up in the logs. I think it's not overriding the component with the one from custom_components.

anxix commented 1 year ago

The following files were required as well:

const.py
manifest.json
config_flow.py
binary_sensor.py
switch.py
lock.py
cover.py

Additionally in manifest.json I had to add a version entry:

, "version": "10.0.0"

Now I see in the logs it does use this version:

2022-10-31 19:26:15.706 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration xiaomi_aqara which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant

There's still a warning that [homeassistant.helpers.service] Unable to find services.yaml for the xiaomi_aqara integration, but that one seems to be just a generic config thing and not important.

With this though, I don't notice a change in behaviour - sensors appear online in the dashboard, but don't react if I walk in front of them for example. I've added a logger to _async_track_unavailable:

    @callback
    def _async_track_unavailable(self):
        _LOGGER.debug("_async_track_unavailable")
        return False

and don't see anything in the logs. My logging call at the bottom of the setup function doesn't appear in the logs neither.

syssi commented 1 year ago

All files are required. Please use a full copy of the component folder.

anxix commented 1 year ago

Right. I've also added to the logger entry in the config:

    custom_components.xiaomi_aqara: debug
    custom_components.sensor.xiaomi_aqara: debug
    custom_components.switch.xiaomi_aqara: debug
    custom_components.binary_sensor.xiaomi_aqara: debug    

Now I can see in the logs the custom logging I've added, so it's using it. The sensors continue to not get any new states. They receive some state at startup depending on whether the door is open etc., then stay in that state.

syssi commented 1 year ago

To force a sensor update per PUSH please return True at the _async_track_unavailable method instead of False.

anxix commented 1 year ago

Changed that as well. Rebooted. Don't notice any difference in behaviour. The sensors don't react, or react highly unreliably (one door sensor reacted a bit, none of the motion sensors did) - they just stick with their initial state. I don't see any PUSH calls for the motion sensors, only the one door sensor that's open shows up. Might have to do with my logging config for the custom_components, I've guessed at its syntax.

Here's the door sensor that worked for a bit:

022-10-31 19:55:33.221 DEBUG (SyncWorker_7) [custom_components.xiaomi_aqara] PUSH >> <Entity Door Window Sensor_aaaaaaaaaaa: on>: {'voltage': 2925, 'status': 'open'}
2022-10-31 19:56:03.202 DEBUG (SyncWorker_6) [custom_components.xiaomi_aqara] PUSH >> <Entity Battery_aaaaaaaaaaa: 25>: {'voltage': 2925, 'status': 'open'}
2022-10-31 19:56:03.203 DEBUG (SyncWorker_6) [custom_components.xiaomi_aqara] PUSH >> <Entity Door Window Sensor_aaaaaaaaaaa: on>: {'voltage': 2925, 'status': 'open'}
2022-10-31 19:56:33.206 DEBUG (SyncWorker_1) [custom_components.xiaomi_aqara] PUSH >> <Entity Battery_aaaaaaaaaaa: 25>: {'voltage': 2925, 'status': 'open'}
2022-10-31 19:56:33.207 DEBUG (SyncWorker_1) [custom_components.xiaomi_aqara] PUSH >> <Entity Door Window Sensor_aaaaaaaaaaa: on>: {'voltage': 2925, 'status': 'open'}
2022-10-31 19:57:03.213 DEBUG (SyncWorker_3) [custom_components.xiaomi_aqara] PUSH >> <Entity Battery_aaaaaaaaaaa: 25>: {'voltage': 2925, 'status': 'close'}
2022-10-31 19:57:03.213 DEBUG (SyncWorker_3) [custom_components.xiaomi_aqara] PUSH >> <Entity Door Window Sensor_aaaaaaaaaaa: on>: {'voltage': 2925, 'status': 'close'}

The door was indeed open at boot time. I closed it for a few seconds, that was not registered (normally it reacts quite quickly), then reopened. After that I closed it for a longer time (maybe a minute or so). That did register. Then I reopened and it's been open for 9-10 minutes now. It still shows up as closed and no PUSH message has been logged for it going to open state.

syssi commented 1 year ago

We are poking around here: https://github.com/home-assistant/core/blob/dev/homeassistant/components/xiaomi_aqara/__init__.py#L356-L360

By returning True I wanted to make sure to hit the self.async_write_ha_state() call because this propagates the state update. Could you log data, raw_data and the return value is_data? May be the PUSH messages aren't decoded properly.

anxix commented 1 year ago

With the following change (and still return True in _async_track_unavailable):

    @callback
    def push_data(self, data, raw_data):
        """Push from Hub."""
        _LOGGER.debug("PUSH >> %s: %s: %s", self, data, raw_data)
        was_unavailable = self._async_track_unavailable()
        try:
            is_data = self.parse_data(data, raw_data)
            _LOGGER.debug("PUSH >> is_data=%s", is_data)
        except:
            _LOGGER.debug("PUSH >> parse_data failed")
        is_voltage = self.parse_voltage(data)
        if is_data or is_voltage or was_unavailable:
            self.async_write_ha_state()

I get this:

2022-10-31 20:29:04.766 DEBUG (SyncWorker_4) [custom_components.xiaomi_aqara] PUSH >> <Entity Battery_aaaaaaaaaaa: 25>: {'voltage': 2925, 'status': 'open'}: {'cmd': 'read_ack', 'model': 'magnet', 'sid': 'aaaaaaaaaaa', 'short_id': 6718, 'data': '{"voltage":2925,"status":"open"}'}
2022-10-31 20:29:04.766 DEBUG (SyncWorker_4) [custom_components.xiaomi_aqara] PUSH >> is_data=True
2022-10-31 20:29:04.768 DEBUG (SyncWorker_4) [custom_components.xiaomi_aqara] PUSH >> <Entity Door Window Sensor_aaaaaaaaaaa: on>: {'voltage': 2925, 'status': 'open'}: {'cmd': 'read_ack', 'model': 'magnet', 'sid': 'aaaaaaaaaaa', 'short_id': 6718, 'data': '{"voltage":2925,"status":"open"}'}
2022-10-31 20:29:04.768 DEBUG (SyncWorker_4) [custom_components.xiaomi_aqara] PUSH >> is_data=False
2022-10-31 20:29:34.759 DEBUG (SyncWorker_4) [custom_components.xiaomi_aqara] PUSH >> <Entity Battery_aaaaaaaaaaa: 25>: {'voltage': 2925, 'status': 'close'}: {'cmd': 'read_ack', 'model': 'magnet', 'sid': 'aaaaaaaaaaa', 'short_id': 6718, 'data': '{"voltage":2925,"status":"close"}'}
2022-10-31 20:29:34.759 DEBUG (SyncWorker_4) [custom_components.xiaomi_aqara] PUSH >> is_data=True
2022-10-31 20:29:34.759 DEBUG (SyncWorker_4) [custom_components.xiaomi_aqara] PUSH >> <Entity Door Window Sensor_aaaaaaaaaaa: on>: {'voltage': 2925, 'status': 'close'}: {'cmd': 'read_ack', 'model': 'magnet', 'sid': 'aaaaaaaaaaa', 'short_id': 6718, 'data': '{"voltage":2925,"status":"close"}'}
2022-10-31 20:29:34.759 DEBUG (SyncWorker_4) [custom_components.xiaomi_aqara] PUSH >> is_data=True

Again, I closed the door, waited some time, then opened it again. The closing did register after a number of seconds (which is I think slower than it used to be, pretty much instantaneous), the reopening did not register. It's been open for 3-4 minutes now. Motion sensors don't react at all.

anxix commented 1 year ago

Gone back to 2022.9.5 and rechecked the door sensor behaviour: its state changes immediately when I open/close the door, tried several times. While the one time it works in 2022.10, it takes many seconds to register the change.

Is it possible to check the new aqara in 2022.9.5? Switching between 2022.9 and .10 is quite involved, as restoring between them - I now use backups to flip back and forth - usually also requires me to reboot the host since it often loses access to its config files.

syssi commented 1 year ago

I will try to provide some more feedback & details tomorrow.

poldim commented 1 year ago

I just checked these and they appear to be the same. I started up stable again and confirmed it's still not working. Unfortunately I'm traveling starting tomorrow so unable to do anymore testing until I get back next week (thanks @anxix).

Works: image: homeassistant/home-assistant:2022.9.7

bash-5.1# jq '.data .entries [] | select(.domain == "xiaomi_aqara")' /config/.storage/core.config_entries
{
  "entry_id": "d57ec3da1854452ca52bde28d2dd5a73",
  "version": 1,
  "domain": "xiaomi_aqara",
  "title": "Xiaomi Aqara Gateway",
  "data": {
    "host": "10.30.30.96",
    "interface": "10.30.30.100",
    "key": "<KEY>",
    "mac": "<MAC>",
    "port": 9898,
    "protocol": "1.1.2",
    "sid": "<MACish>"
  },
  "options": {},
  "pref_disable_new_entities": false,
  "pref_disable_polling": false,
  "source": "zeroconf",
  "unique_id": "<MAC>",
  "disabled_by": null
}
bash-5.1# 

Broken: image: homeassistant/home-assistant:stable - currently 2022.10.3

bash-5.1# jq '.data .entries [] | select(.domain == "xiaomi_aqara")' /config/.storage/core.config_entries
{
  "entry_id": "d57ec3da1854452ca52bde28d2dd5a73",
  "version": 1,
  "domain": "xiaomi_aqara",
  "title": "Xiaomi Aqara Gateway",
  "data": {
    "host": "10.30.30.96",
    "interface": "10.30.30.100",
    "key": "<KEY>",
    "mac": "<MAC>",
    "port": 9898,
    "protocol": "1.1.2",
    "sid": "<MACish>"
  },
  "options": {},
  "pref_disable_new_entities": false,
  "pref_disable_polling": false,
  "source": "zeroconf",
  "unique_id": "<MAC>",
  "disabled_by": null
}
bash-5.1# 
sergiorademacher commented 1 year ago

Hi!

Same behavior here, the sensors report back for a while and then they stop altogether.

I used to run it on ubuntu+docker without any issues. Later I migrated to Proxmox+HassOS 2022.9, no issues. Then I upgraded to 2022.10 and it stopped working. I have tinkered my setup without success (IGMP snooping, interfaces, etc).

My current setup. Thank you!

Home Assistant 2022.10.5 Supervisor 2022.10.0 Operating System 9.3 Frontend 20221010.0 - latest

[core-ssh ~]$ jq '.data .entries [] | select(.domain == "xiaomi_aqara")' /config/.storage/core.config_entries { "entry_id": "940d4c25f5fd3cb528202f5b9394d54e", "version": 1, "domain": "xiaomi_aqara", "title": "Xiaomi Aqara Gateway Playroom", "data": { "host": "192.168.50.22", "port": 9898, "mac": "", "interface": "192.168.50.6", "protocol": "1.1.2", "key": "", "sid": "" }, "options": {}, "pref_disable_new_entities": false, "pref_disable_polling": false, "source": "zeroconf", "unique_id": "", "disabled_by": null } { "entry_id": "fc3eab5e99c53f24c7d0b9a7490ba0c3", "version": 1, "domain": "xiaomi_aqara", "title": "Xiaomi Aqara Gateway Upstairs", "data": { "host": "192.168.50.21", "port": 9898, "mac": "", "interface": "192.168.50.6", "protocol": "1.1.2", "key": "", "sid": "" }, "options": {}, "pref_disable_new_entities": false, "pref_disable_polling": false, "source": "zeroconf", "unique_id": ", "disabled_by": null } { "entry_id": "e48e3d43da415a82f90839916cb5ea53", "version": 1, "domain": "xiaomi_aqara", "title": "Xiaomi Aqara Gateway Bedroom", "data": { "host": "192.168.50.92", "port": 9898, "mac": "", "interface": "192.168.50.6", "protocol": "1.1.2", "key": "", "sid": "" }, "options": {}, "pref_disable_new_entities": false, "pref_disable_polling": false, "source": "zeroconf", "unique_id": "", "disabled_by": null } { "entry_id": "655fac008149e65ae2772691f2df6fb0", "version": 1, "domain": "xiaomi_aqara", "title": "Xiaomi Aqara Gateway Kitchen", "data": { "host": "192.168.50.20", "port": 9898, "mac": "", "interface": "192.168.50.6", "protocol": "1.1.2", "key": "", "sid": "" }, "options": {}, "pref_disable_new_entities": false, "pref_disable_polling": false, "source": "zeroconf", "unique_id": "", "disabled_by": null }

KmanOz commented 1 year ago

I fear something bigger might be going on. On my brand new install zeroconf and Homekit is broken as well. Without knowing any of the dependencies it seems it may be something that's breaking all these services perhaps. Homekit has a few people complaining about it as well.

sergiorademacher commented 1 year ago

Hi! Quick update. Last night I updated to 2022.11 and unfortunately, it has the same behavior, it is not working.

rleongcs commented 1 year ago

That's what I was afraid of. Guessed have to stay at 2022.9.7 till this is resolved. If no code changes were done to the Xiaomi Gateway (Aqara) integration, then something bigger is definitely causing all these issues as @KmanOz mentioned earlier.

syssi commented 1 year ago

We can probably disprove the assumption by using the xiaomi_aqara component of 2022.9.7 as custom component with Home Assistant >=2022.10. Just extract the attached archive to your custom_components folder (at your config directory) and restart Home Assistant: xiaomi-aqara-2022-9-7.zip

rleongcs commented 1 year ago

@syssi So just extract that and no more settings changes needed? Will try it out.

anxix commented 1 year ago

@syssi So just extract that and no more settings changes needed? Will try it out.

You'll have to edit the manifest.json file and add a version number. as I wrote here: https://github.com/home-assistant/core/issues/80249#issuecomment-1297499605

It will look like this:

{
  "domain": "xiaomi_aqara",
  "name": "Xiaomi Gateway (Aqara)",
  "config_flow": true,
  "documentation": "https://www.home-assistant.io/integrations/xiaomi_aqara",
  "requirements": ["PyXiaomiGateway==0.13.4"],
  "after_dependencies": ["discovery"],
  "codeowners": ["@danielhiversen", "@syssi"],
  "zeroconf": ["_miio._udp.local."],
  "iot_class": "local_push",
  "loggers": ["xiaomi_gateway"],
  "version": "10.0.0"
}
anxix commented 1 year ago

My first impression is that 2022.10.5 combined with the old xiaomi plugin works OK. Tried 2 motion sensors and a door sensor, all react immediately in the dashboard. But let's give it the 2.5h and see if the system remains functional.

rleongcs commented 1 year ago

Ok...doing all system backups and trying it out. I have 3 gateways and some 50 devices, so will know the outcome soon. Previously after restart of integration will work immediately and then all becomes unavailable after some hours.

I guessed I will just try out with 2022.11.0.

rleongcs commented 1 year ago

Upgraded to 2022.11.0 and everything is up and running well. Tested door sensor, motion sensors and also switches. Now the countdown begins...fingers crossed!

Update: 1.5 hrs now and all sensors and switches still working well and very responsive.

anxix commented 1 year ago

It's been 3 hours now, sensors still working properly with zipped custom component and 2022.10. Haven't tried 2022.11.0, but would assume situation is similar (I always wait for a few update packs before getting a new HA version).

rleongcs commented 1 year ago

It is now more than 2.5 hrs and all is still working fine. Looks like custom component and 2022.11.0 also working good. Will continue to monitor and report.

Update...5 hrs and everything still looking good!

anxix commented 1 year ago

5.75h and still working correctly with the old xiaomi plugin version on 2022.10.5

kanemari commented 1 year ago

looks good enough for me to try. will do so now and report back.

Rubyman77 commented 1 year ago

Tried extracting xiaomi-aqara-2022-9-7.zip to custom components folder and changing version in manifest.json then restarting but nothing changes for me. Sensors still don't update. Am I doing something wrong?