mKeRix / room-assistant

Presence tracking and more for automation on the room-level
https://www.room-assistant.io
MIT License
1.27k stars 122 forks source link

Occasional incorrect not_homes #142

Closed Alfiegerner closed 4 years ago

Alfiegerner commented 4 years ago

Describe the bug Occasional incorrect not_homes.

I have provided an example in the logs narrowed down to device and time, the not_home send at 8:17:18. Included the leader log and parents_bedroom node logs - the device was very close to the parents_bedroom log.

To reproduce Have not been able to replicate at will, but happens a few times during the day.

Relevant logs logs

Relevant configuration Paste the relevant parts of your configuration below.

I don't think config is relevant, will add if you want me to.

Expected behavior Not to send not_home if a response has been received from a node within the last cycle.

Environment

Additional context N/A

Alfiegerner commented 4 years ago

Any idea what might be causing this? The not_home is always corrected quickly (in the example above 1 second after the not_home, but I think always within 10 seconds), so this isn't super serious and easy to handle with delays in HA.

Thanks!

mKeRix commented 4 years ago

My instinct tells me that maybe the timeout is still just a tiny bit too short for BT Classic sensors, but I'm not entirely sure. I think a good first step would be making that timeout configurable, then you can increase it slightly and check if that takes care of it. The logs that you provided look normal to me at first glance.

Alfiegerner commented 4 years ago

Thanks for looking at it. Happy to try configurable timeout, but no rush. Thanks 🤙

mKeRix commented 4 years ago

Can you try upping timeoutCycles slightly after upgrading to 2.2.0? :)

doublej0 commented 4 years ago

I have the same experience ... It has been just recently with the version prior to 2.2.0 and with 2.2.0. I am using an iPhone X and an Apple Watch 3 and both are reporting "not_home" at the same time. I have 4 rPi Zero W's.

I turned off all 4 rPi's changed my config to test a Mi Band 4 using BLE thinking it was the BT Classic and it has the same experience.

mKeRix commented 4 years ago

@doublej0 did you try raising timeoutCycles? Sometimes you can also run into this sort of behavior if your cluster connections are acting up, in that case it might help to manually define peerAddresses and a quorum in the cluster settings.

doublej0 commented 4 years ago

@doublej0 did you try raising timeoutCycles? Sometimes you can also run into this sort of behavior if your cluster connections are acting up, in that case it might help to manually define peerAddresses and a quorum in the cluster settings.

I updated the timeoutCycles from 5 (default) to 10 and then 15 for my iPhone & my Apple Watch using bluetoothClassic. I continued to have the "not_home" experience.

For the Mi Band 4 using BLE, setting the timeoutCycles to 15 seems to have done the trick.

I will continue testing the bluetoothClassic by adjusting the cluster options to see if that works. If one rPi goes offline, that should not impact the other rPi if the devices are within range, correct?

mKeRix commented 4 years ago

timeoutCycles is only valid for BT Classic, so if you used BLE for the Mi Band 4 that setting shouldn't have made a difference.

And yeah, the clustering is built so that things still work even if one node is temporarily unavailable. The node that lost connection might think that it is in its own new cluster now though and will then start overriding distributed entities (like the BT sensors). The quorum gets rid of this "split brain" issue, as it only allows a cluster that contains the majority to make the decision (which means only one cluster in the network is controlling the distributed entities).

dimmanramone commented 4 years ago

I have the same problem. 3 devices in the cluster:

My configuration looks like the following almost for all the devices in the cluster:

global:
  instanceName: room1
  integrations:
    - homeAssistant
    - bluetoothClassic
cluster:
  networkInterface: eno1
  port: 6425
  weight: 15 (for NUC, 10 for pi3 and 5 fot piZero)
  peerAddresses:
    - '192.168.0.1:6425'
    - '192.168.0.2:6425'
    - '192.168.0.3:6425'
homeAssistant:
  mqttUrl: 'mqtt://192.168.0.1:1883'
bluetoothClassic:
  minRssi: -20
  interval: 6
  timeoutCycles: 10
  addresses:
    - '3x:2x:6x:1x:1x:fx'
    - '4y:4y:3y:by:9y:9y'

Any ideas? Is the quorum going to help and if so which value is the right one? 3?

Alfiegerner commented 4 years ago

@mKeRix - apologies for delay in coming back to your.

Adjusting timeout cycles to 3 worked for me.

Also keeping the timeout cycle to 2 and bumping interval to 8 works (haven't tried 7 yet), which is better for me as with 9 nodes the additional cycle adds a bigger chunk of time.

I've also found that rebooting the pizeros every night seems to be have an impact - nightly reboots and internal 8 has worked well for 48 hours for me.

dimmanramone commented 4 years ago

@Alfiegerner @mKeRix I guess I have to try some combinations. It seem that interval 6 and timeoutCycles 10 works a little bit better (still getting incorrect not:homes) but not for all the devices :/ And my rpi 3 seems like it crashed. I have seen occasionally high cpu usage from room assistant even in my NUC.

mKeRix commented 4 years ago

@dimmanramone room-assistant shouldn't use up a lot of CPU and didn't in my testing, but to be fair I don't have permanent monitoring on my Pis (yet). I've also not seen crashes yet - just Pis dropping off the WiFi when running BT Classic, presumably because the shared chip that handles both these things messed up. Either way, as Bluetooth is finicky and devices implement it differently a lot of this is up to trying things out unfortunately. I'm happy to take on feature requests though if you have other ideas to solve these issues though!

For now I'll close this, but feel free to re-open the ticket if the problem comes back.

dimmanramone commented 4 years ago

@mKeRix Well it shouldn't but unfortunately it does in some cases, See the screenshot for example in my NUC with Hass.io. You can see that it uses a lot of CPU and RAM.

Screenshot 2020-04-13 at 12 37 10

As long as for the my raspberry pi zero and pi3 seems that the bt hangs but the wifi still works. I'll try to use ethernet in both instead and see if it helps.