home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.43k stars 30.67k forks source link

Race condition of events in Shelly BLU devices #111137

Closed FlorianWilk closed 8 months ago

FlorianWilk commented 8 months ago

The problem

My problem is a possible race condition of events in the Shelly Integration (BLU devices). I have some Shelly BLU Door/Window-Sensors. Sometimes, when i open/close the main door too quickly, the sensor in HA says "door open", although the door is closed. In the Shelly APP the status of the door is correct (closed). I GUESS the events come in to quickly and there is some kind of race condition (close-event arrives before the open-event) - but thats just a guess. I have several automations based on these kinds of events (alerts, warnings when my windows/doors are open for too long), so in these cases i have to open and close the door again (slower), so HA has the correct status.

What version of Home Assistant Core has the issue?

core-2024.2.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Shelly

Link to integration documentation on our website

No response

Diagnostics information

The only diagnostic information i can provide is the fact, that the Status of these Sensors is correct in the Shelly APP. maybe it uses some kind of timestamp to sort the events? (again, just a guess)

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 8 months ago

Hey there @balloob, @bieniu, @thecode, @chemelli74, @bdraco, mind taking a look at this issue as it has been labeled with an integration (shelly) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `shelly` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign shelly` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


shelly documentation shelly source (message by IssueLinks)

home-assistant[bot] commented 8 months ago

Hey there @ernst79, mind taking a look at this issue as it has been labeled with an integration (bthome) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `bthome` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign bthome` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


bthome documentation bthome source (message by IssueLinks)

L0bit0 commented 8 months ago

Hi

I am having similar (or exactly the same) issue.

Wired thing is that status is shown properly on Shelly app, so message is sent from the device.

Also, I tried to reload the de device, or open and close again, and status remains the same for long time.

Any workarround?

thecode commented 8 months ago

Can you enable Signal Strength sensor for the device and report the level?

Ernst79 commented 8 months ago

Just for the sake of clarity, in the opening post, you mentioned that you have this issue with the Shelly integration, but I assume it’s the BTHome integration, isn’t it?.

thecode commented 8 months ago

Just for the sake of clarity, in the opening post, you mentioned that you have this issue with the Shelly integration, but I assume it’s the BTHome integration, isn’t it?.

I guess most people assume that Shelly BLU devices are handled by Shelly integration, that is why first label added was Shelly, I didn't remove it since I also want to understand if there is indeed a Shelly involved as a Bluetooth proxy.

@FlorianWilk can you add more details about what setup you use for receiving data from the Shelly BLU devices?

L0bit0 commented 8 months ago

Can you enable Signal Strength sensor for the device and report the level?

-74 dBm

Just for the sake of clarity, in the opening post, you mentioned that you have this issue with the Shelly integration, but I assume it’s the BTHome integration, isn’t it?.

Is true the integration it is not Shelly, but for me is interesting that Shelly app can read data properly….

L0bit0 commented 8 months ago

@thecode @Ernst79 or anyone else… any idea?let me know if you need more info. Thanks in advance

thecode commented 8 months ago

I am still waiting for a response for https://github.com/home-assistant/core/issues/111137#issuecomment-1963065480: "@FlorianWilk can you add more details about what setup you use for receiving data from the Shelly BLU devices?"

FlorianWilk commented 8 months ago

@FlorianWilk can you add more details about what setup you use for receiving data from the Shelly BLU devices?

@thecode: Sorry for the delay. I have several Shelly Plug Plus, Shelly Pro 3EM, Shelly 1PM and Shelly 2PM in my home. most of them are setup as Bluetooth Gateway in the Shelly APP (they are checked as gateways, when i added the BLU Devices). Then i have several BLU Devices as the mentioned Door/Window Devices and some BLU Motion Sensors. Everything works absolutely perfect, except in these situations, when i close my main-door "too fast", so that it swings back a little (opened again for a fraction of a second) and then closes. This only happens on the main-door, because its heavier that the other doors (they don't swing back).

Then in my Homeassistant i have the Shelly Integration and - yes @L0bit0 , you were absolutely right - the BTHome Integration, which provides the data from the BLU Devices.

Does this help? I can make a video of the door/sensor when closing the door and some screenshots of the Shelly APP and HA-Sensors tomorrow if this helps identifying the issue.

I guess most people assume that Shelly BLU devices are handled by Shelly integration, that is why first label added was Shelly, I didn't remove it since I also want to understand if there is indeed a Shelly involved as a Bluetooth proxy.

sorry for the misleading description/labeling. It is indeed the BT Home Integration. When i wrote this issue i just had "Shelly Device" in my mind.

Signal Strength is -68dBm. But I don't think that it's an issue with signal strength because i don't have any issues when i close the door "slowly"/normal.

LandoGemini commented 8 months ago

Having the exact same issue as described by OP. Any help on this would be greatly appreciated. If I can provide any data points I'd be glad to do it. Setup(all on the same floor): 7 shelly blu door/window sensors. 3 shelly devices with bluetooth gateway active 1 Asus BT-400 dongle connected to my HA server If door or window is cycled(opened/closed) quickly the status of the sensor will not change back to closed in HA. In the shelly app everything is working corectly

Ernst79 commented 8 months ago

Just thinking, but could it be the case that a BLE message is received on multiple gateways, and that one of the gateways that is further away is receiving (or processing) the message later than another gateway, that is closer and which has already received a newer message?

FlorianWilk commented 8 months ago

Just thinking, but could it be the case that a BLE message is received on multiple gateways, and that one of the gateways that is further away is receiving (or processing) the message later than another gateway, that is closer and which has already received a newer message?

Sounds possible. But why does the Shelly APP have the correct status? Is there some kind of timestamp in the event which is used to sort those events in the Shelly APP but not in BT Home?

Ernst79 commented 8 months ago

Doesn't the Shelly app receive the data directly? (phone listens to the data directly, I think). That would explain why the Shelly app receives the data in the correct order.

LandoGemini commented 8 months ago

No, the shelly app gets the data from the cloud. BLU sends BT message - received by one or more shelly wired devices with Bluetooth listening enabled - forwarded to cloud

Ernst79 commented 8 months ago

hmmm, ok. There is no timestamp in the BLE message, afaik. I also thought the BLU door sensor is not using a packet_id, but I'll have to check. Otherwise, the shelly app might do filtering on the packet_id different compared to the HA integration. (e.g. that the app checks if the new packet_id> last packet_id, while the HA integration only filters advertisements where the new packet_id is the same as the last packet_id. But, as said, I don't think the door sensor uses this packet_id, I'll check tonight.

RASBR commented 8 months ago

Hi, I am facing the same problem which I posted on the Community also here

I hope a solution comes out soon.

LandoGemini commented 8 months ago

hmmm, ok. There is no timestamp in the BLE message, afaik. I also thought the BLU door sensor is not using a packet_id, but I'll have to check. Otherwise, the shelly app might do filtering on the packet_id different compared to the HA integration. (e.g. that the app checks if the new packet_id> last packet_id, while the HA integration only filters advertisements where the new packet_id is the same as the last packet_id. But, as said, I don't think the door sensor uses this packet_id, I'll check tonight.

Screenshot_20240229_190344_Home Assistant I believe it does provide packet_id. The screenshot is from one of the sensors

Ernst79 commented 8 months ago

Yep, I also found that. Is someone able to manually modify tha xiaomi-ble pypi package on his/her system, such that we change this line? I don’t have a Shelly BLE gateway, so I can’t test it myself.

https://github.com/Bluetooth-Devices/bthome-ble/blob/3bf6e69bf3c6f81bcd28ae7bbb7119be9e7b4876/src/bthome_ble/parser.py#L438

if new_packet_id == last_packet_id:

to

if new_packet_id <= last_packet_id:
thecode commented 8 months ago

Yep, I also found that. Is someone able to manually modify tha xiaomi-ble pypi package on his/her system, such that we

if new_packet_id <= last_packet_id:

I didn't experience the problem described here, however I tested a door/window on a desk and after about 20 times with trying to do it fast as possible I succeeded to make the state stay open while the magnet is closed.

I am still not sure if it is a race condition issue or a lost packet since adding the same sensor to two HA instances I could cause it to show open on one and closed on the other, I will try to log the packets and see if a packet is missing or arrives in a different order.

If I can reproduce it and be sure it is a race condition I will make the suggested change. Will report back with my results later today.

Ernst79 commented 8 months ago

Please note that the suggested change is only for testing. E.g. if the counter goes from 255 to 1, it will skip following messages. So, only use it temporary, when the counter is low, and keep an eye on the counter value. Alternative is checking the BLE advertisements by enabling debug logging for the Bluetooth integration, and checking the order of the incoming BLE advertisements. If the order is not increasing +1, but +2 or more, than a message probably was missed. If it decreases, it is received in the wrong order, most likely due to the multi-gateway setup. The last can be fixed if that is the case.

thecode commented 8 months ago

For now my tests just shows that packets are missing (although the packets are with high RSSI, -54dBm):

44 00 c2 01 64 05 bc 2e 01 2d 01 3f 00 00
44 00 c4 01 64 05 40 32 01 2d 01 3f 00 00
44 00 c5 01 64 05 10 3a 01 2d 00 3f 00 00
44 00 c7 01 64 05 28 36 01 2d 01 3f 38 00
44 00 c9 01 64 05 54 88 00 2d 01 3f 00 00

After packet C7 I left the device for few minutes, it was shown as open, while it was actually closed. after few minutes I opened the device and as can be seen it sent a packet indicating a state change to open.

I don't know why there are so many missing packets but this is another issue.

Would be good if someone with the race condition problem provide a log so we can check it, we recently changed the scanner in aioshelly to send data in chunks so possibly with multiple proxies one will reach HA before the other.

FlorianWilk commented 8 months ago

Would be good if someone with the race condition problem provide a log so we can check it, we recently changed the scanner in aioshelly to send data in chunks so possibly with multiple proxies one will reach HA before the other.

could you give me (HA n00b) a hint of what logs exactly you need? I can easily reproduce this behaviour (it just happened again 10minutes ago). Regarding the multiple proxies: i had the same issue when i had only one device as Proxy (Shelly Plug Plus next to the front door). Thats why i decided to use some of the other devices as proxy as well.

thecode commented 8 months ago

could you give me (HA n00b) a hint of what logs exactly you need? I can easily reproduce this behaviour (it just happened again 10minutes ago). Regarding the multiple proxies: i had the same issue when i had only one device as Proxy (Shelly Plug Plus next to the front door). Thats why i decided to use some of the other devices as proxy as well.

Please add the following to configuration.yaml, restart home assistant, reproduce and attach the log.

logger:
  default: info
  logs:
    homeassistant.components.bluetooth: debug
    bthome-ble: debug

Please detail the time of the event and the MAC of the Shelly DW

Note: it is better to drag the log into the comment (which will add it as an attachment) and not copy paste as it is hard to read logs in GitHub.

Thanks

FlorianWilk commented 8 months ago

Here we go. I reproduced the behaviour in the basement (not the main door), because it's too loud and our baby is sleeping. The Basement door has a lower signal-strength but i can reproduce it with the main door later, so you can see it's propably not an issue of the signal-strength. I had to try it several times to have it in the wrong state.

Some Screenshots:

kellertuer shellykeller

Here is the log. I did a grep "bthome\|components.bluetooth" Downloads/home-assistant_bthome.log > btkellertuer.log to delete the noise:

btkellertuer.log

"Open" state received at 10:42:34. Mac of the Shelly is 3C:2E:F5:FB:AA:C0

kelleropened

Log of the ShellyAPP:

shellylog

Hope this helps.

thecode commented 8 months ago

Here we go. I reproduced the behaviour in the basement (not the main door), because it's too loud and our baby is sleeping.

btkellertuer.log

"Open" state received at 10:42:34. Mac of the Shelly is 3C:2E:F5:FB:AA:C0

kelleropened

Log of the ShellyAPP:

shellylog

Thanks @FlorianWilk Last two lines from your log:

2024-03-01 10:42:34.729 DEBUG (MainThread) [homeassistant.components.bluetooth.manager] shellypro3em-34987a453d70 (34:98:7A:45:3D:70): 3C:2E:F5:FB:AA:C0 AdvertisementData(local_name='SBDW-002C', manufacturer_data={2985: b'\x01\x01\x00\x0b\x02\x00\n\xc0\xaa\xfb\xf5.<'}, service_data={'0000fcd2-0000-1000-8000-00805f9b34fb': b'D\x004\x01d\x05\xb0\x04\x00-\x00?\x00\x00'}, tx_power=-127, rssi=-73) match: set()
2024-03-01 10:42:34.729 DEBUG (MainThread) [homeassistant.components.bluetooth.manager] shellypro3em-34987a453d70 (34:98:7A:45:3D:70): 3C:2E:F5:FB:AA:C0 AdvertisementData(local_name='SBDW-002C', manufacturer_data={2985: b'\x01\x01\x00\x0b\x02\x00\n\xc0\xaa\xfb\xf5.<'}, service_data={'0000fcd2-0000-1000-8000-00805f9b34fb': b'D\x003\x01d\x05\xb0\x04\x00-\x01?\x00\x00'}, tx_power=-127, rssi=-75) match: set()

Translates two: 2024-03-01 10:42:34.729 44 00 Packet ID: 34 01 64 05 b0 04 00 2d State: 00 (closed) 3f 00 00 2024-03-01 10:42:34.729 44 00 Packet ID: 33 01 64 05 b0 04 00 2d State: 01 (open) 3f 00 00

So it is confirmed that the packets arrived at wrong order and the last one was discarded, interesting that they arrive from the same device (Shelly proxy) but still in wrong order. I will update when we have a solution. Many thanks

FlorianWilk commented 8 months ago

Here we go. I reproduced the behaviour in the basement (not the main door), because it's too loud and our baby is sleeping. btkellertuer.log "Open" state received at 10:42:34. Mac of the Shelly is 3C:2E:F5:FB:AA:C0 kelleropened Log of the ShellyAPP: shellylog

Thanks @FlorianWilk Last two lines from your log:

2024-03-01 10:42:34.729 DEBUG (MainThread) [homeassistant.components.bluetooth.manager] shellypro3em-34987a453d70 (34:98:7A:45:3D:70): 3C:2E:F5:FB:AA:C0 AdvertisementData(local_name='SBDW-002C', manufacturer_data={2985: b'\x01\x01\x00\x0b\x02\x00\n\xc0\xaa\xfb\xf5.<'}, service_data={'0000fcd2-0000-1000-8000-00805f9b34fb': b'D\x004\x01d\x05\xb0\x04\x00-\x00?\x00\x00'}, tx_power=-127, rssi=-73) match: set()
2024-03-01 10:42:34.729 DEBUG (MainThread) [homeassistant.components.bluetooth.manager] shellypro3em-34987a453d70 (34:98:7A:45:3D:70): 3C:2E:F5:FB:AA:C0 AdvertisementData(local_name='SBDW-002C', manufacturer_data={2985: b'\x01\x01\x00\x0b\x02\x00\n\xc0\xaa\xfb\xf5.<'}, service_data={'0000fcd2-0000-1000-8000-00805f9b34fb': b'D\x003\x01d\x05\xb0\x04\x00-\x01?\x00\x00'}, tx_power=-127, rssi=-75) match: set()

Translates two: 2024-03-01 10:42:34.729 44 00 Packet ID: 34 01 64 05 b0 04 00 2d State: 00 (closed) 3f 00 00 2024-03-01 10:42:34.729 44 00 Packet ID: 33 01 64 05 b0 04 00 2d State: 01 (open) 3f 00 00

So it is confirmed that the packets arrived at wrong order and the last one was discarded, interesting that they arrive from the same device (Shelly proxy) but still in wrong order. I will update when we have a solution. Many thanks

Glad i could help a bit :)

Just a thought: You could sort them by PacketIDs when the packateIDs are between a defined threshold (f.ex. 5, 10,30 whatever). So 240,239,242,255,0,1,2 would become 239,240,242,255,0,1,2

thecode commented 8 months ago

@FlorianWilk a fix will be added to 2024.3.0. I would like to check with Shelly why the order of the packets is incorrect, can you add your firmware version for the Shelly Pro 3EM? Thanks

FlorianWilk commented 8 months ago

@FlorianWilk a fix will be added to 2024.3.0. I would like to check with Shelly why the order of the packets is incorrect, can you add your firmware version for the Shelly Pro 3EM? Thanks

Sure, the Shelly 3EM which was Proxy for the requests from the log has Firmware 20240222-085114/1.2.1-gf4a1681, but i will update to 1.2.2 today. The other 3EM, which is also a proxy has 20240223-141900/1.2.2-g7c39781.

FlorianWilk commented 7 months ago

Veery strange. Since the update to 2024.3.0 everything was working without any issues. Until now.

Shelly APP - saying Maindoor is closed WhatsApp Image 2024-03-17 at 18 13 18

HA - saying Maindoor is open ("wurde geöffnet" = "is open") WhatsApp Image 2024-03-17 at 18 13 17

Since i haven't had any problems since the update, i did not have debug logging enabled. I am running 2024.3.1 now. Has something changed from 2024.3.0 to 2024.3.1?

I thought that maybe the PacketID switched from 255 to 0, the race condition occured at the same time and the new bugfixed logic couldn't handle that. But the PacketID at 17:49 for this sensor was around 50-60.

In general, however, the behaviour has absolutely improved. That was just an exotic exception.

LandoGemini commented 7 months ago

Same here, I'm on 2024.3.1 and just noticed today same issue again.

Ernst79 commented 7 months ago

There have been two changes in 2024.3.1

https://github.com/home-assistant/core/pull/112783 https://github.com/home-assistant/core/pull/113008

The first PR is not likely to affect the behavior, the second might be.

in the last one, we bumped bthome-ble to 3.8.0. It has some changes to allow one decryption failure, before asking to reauthorize the encryption key.

https://github.com/Bluetooth-Devices/bthome-ble/pull/114

I looked at the change, but I don’t see it yet, how it can cause your issue.

I think we need some new logs again to figure this out, what is going wrong. If you have new logs, please create a new issue or reopen this one.

thecode commented 7 months ago

When I tried to reproduce in the past this issue, I had several occasions that I thought that I reproduced it and when looking at the log I found out that the last state is just missing.

Note that we only filter a non-recent packet from taking the state of a newer packet, but if the last packet is missing the state will still be incorrect.

As @Ernst79 asked, a log will be helpful to understand. I can reopen this issue, but if you don't mind I prefer a new issue, you can link to this one in the description.

FlorianWilk commented 7 months ago

When I tried to reproduce in the past this issue, I had several occasions that I thought that I reproduced it and when looking at the log I found out that the last state is just missing.

Note that we only filter a non-recent packet from taking the state of a newer packet, but if the last packet is missing the state will still be incorrect.

As @Ernst79 asked, a log will be helpful to understand. I can reopen this issue, but if you don't mind I prefer a new issue, you can link to this one in the description.

Correct me if i am wrong, but since the Shelly APP has the right state, the package somehow reached the proxy device and was transmitted to the ShellyCloud. So i guess it's not a missing package, right?

I have activated debug logging again and will try to reproduce it tomorrow.

FlorianWilk commented 7 months ago

Same here, I'm on 2024.3.1 and just noticed today same issue again.

@LandoGemini In my case one of the ProxyDevices did not have BLE Observer / BT Proxy enabled anymore. Not sure how that happened, but maybe this could also be a solution for your problem.

Update: I was wrong. The above didn't solve this issue. I reopened the new issue again :(