home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.19k stars 30.21k forks source link

New update of zwave-js and zwavejs2mqtt broke the motion detection of Fibaro FGMS001 #50598

Closed f45tb00t closed 3 years ago

f45tb00t commented 3 years ago

The problem

The motion and tampering sensors of my Fibaro FGMS001 are not updated anymore. Therefore my automations do not work. I haven't changed anything with the sensors nor the automations. The status is stuck of the sensors, either way off or on all the time.

What is version of Home Assistant Core has the issue?

core-2021.5.3

What was the last working version of Home Assistant Core?

core-2021.4.6

What type of installation are you running?

Home Assistant Core

Integration causing the issue

zwave-js, zwavejs2mqtt

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zwave_js/

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Nothing special.

2021-05-14 03:45:54 WARNING (Recorder) [homeassistant.components.recorder.migration] Database is about to upgrade. Schema version: 14
2021-05-14 03:45:58 ERROR (MainThread) [homeassistant.components.zwave_js] Failed to connect: Server disconnected
2021-05-14 03:45:58 WARNING (MainThread) [homeassistant.config_entries] Config entry 'Z-Wave JS' for zwave_js integration not ready yet: Server disconnected; Retrying in background
2021-05-14 04:05:59 WARNING (Thread-5) [homeassistant.components.mqtt] Disconnected from MQTT server 127.0.0.1:1883 (1)

But it is reachable. 

sudo netstat -tulpena | grep :3000
tcp        0      0 0.0.0.0:3000            0.0.0.0:*               LISTEN      0          22031      1201/docker-proxy   
tcp        0      0 172.18.0.1:34694        172.18.0.2:3000         VERBUNDEN   0          23325      1209/docker-proxy   
tcp6       0      0 :::3000                 :::*                    LISTEN      0          21064      1209/docker-proxy   
tcp6       0      0 ::1:56054               ::1:3000                VERBUNDEN   998        23888      1857/python3.8      
tcp6       0      0 ::1:3000                ::1:56054               VERBUNDEN   0          25716      1209/docker-proxy

Additional information

I already tried to downgrade. But unfortunately without real luck. Went back to v4.0.0 Also I've tried to remove all devices and readd them again. Same with the Stick ZW090, I did a complete factory reset and configured everything from scratch again. Only thing I've changed so far is that I added some new gosund sp111 power plugs. But this is 2.4 GHz and should not interfere at all with zwave on 868.4MHz. I've tried it as well with "Re-interview Node" and "Heal Network/Heal Node" Furthermore when remove and readd the device in my living room they work for a short period of time, but when placed back to their usual place 1 sensor works, the others don't.

Any help is much appreciated. Thanks you in advance.

cheers

fb

f45tb00t commented 3 years ago

and another occurance. Seismic still stucks. Motion stucked also, but went back to normal. Just take this log. It has everything included zwavejs_1.log_2021-05-19_2.log.txt

f45tb00t commented 3 years ago

The seismic sensor is still on stuck. The motion works meanwhile

Bildschirmfoto von 2021-05-19 19-57-00

AlCalzone commented 3 years ago

Huh, I think I know what's going on. Motion sensor status and Cover status are both variables under the Home Security type. When the idle message comes, both should be reset to idle, but only one of them is.

AlCalzone commented 3 years ago

I've raised an issue in the driver repo to track this. This was previously masked by the automatic idle reset after 5 minutes which broke some motion sensors and has since been made opt-in.

f45tb00t commented 3 years ago

Awesome that you have an idea :)

But in fact I just used the seismic sensor to even realize if the connection is broken or not. About the other handling of the status I have no idea. I'm not really a coder. But you can ask me anything about network security :P

Edit: but what do you mean with idle message? Because 3= seismic on and 0= seismic off, motion on = 8 and motion off = 0. But in the logs you'll never find 3 -> 0

AlCalzone commented 3 years ago

but what do you mean with idle message? Because 3= seismic on and 0= seismic off, motion on = 8 and motion off = 0. But in the logs you'll never find 3 -> 0

Notifications are weird. The device sends us the info that the notification sensor with the type 7 (Home Security) went to idle. Both motion and tamper are variables under that type.

Notification reports just include "type 7, value 8" - meaning "motion was detected". If we now receive "type 7, value 3" - that means "the tamper alarm is active", but the motion is still active. Later the device sends "type 7, value 0", which should mean that all variables under "Home Security" are back to idle. However, it seems that the driver just resets one of them to idle.

kpine commented 3 years ago

What about the event parameter?

2021-05-19T15:24:49.377Z DRIVER « [Node 006] [REQ] [ApplicationCommand]
                                  └─[SecurityCCCommandEncapsulation]
                                    │ sequenced: false
                                    └─[NotificationCCReport]
                                        notification type:   7
                                        notification status: 255
                                        notification event:  0x00
                                        event parameters:    
2021-05-19T15:24:49.380Z CNTRLR   [Node 006] [~] [Notification] Home Security[Motion sensor status] [Endpoint 0]
                                  : 8 => 0

The funky character for event parameters shows up as ^H in my editor, which is backspace. Backspace is decimal 8 in ASCII. Idle notification for V5 can carry the notification value that is going idle as the event. Does this not mean that the motion detection value of 8 is the one that went to idle?

AlCalzone commented 3 years ago

Good point - I probably would have stumbled across this when looking at the code. For me it renders as "BS", but this is the only event parameter that's appearing for the idle event in the log. I do see a couple of these logs

2021-05-19T10:23:53.489Z DRIVER   Dropping message with invalid data (Reason: Nonce 0xba expired, cannot decode 
                                  security encapsulated command.):
                                  0x0124000400071e9881100787baa39516742a0e98f4d98ecdf334860fbaae3c09b7d21d2b9bd4

which I cannot decode without the network key, but I suspect this is the idle message for the tamper alarm.

If that's the case, the device is reusing nonces, therefore the driver does not accept the messages. OZW did not do this check (which is a security risk btw), which would explain why it worked there.

@f45tb00t can you exclude the node and re-include it without encryption? If it works then, this is a device bug - and one I'm not going to work around.

deku-m commented 3 years ago

@AlCalzone I was watching this topic as well as im facing sort of the same issues as @f45tb00t. My logs are not encrypted btw. https://github.com/home-assistant/core/issues/50632

Some part:

2021-05-14T10:07:01.618Z SERIAL « 0x010f00040006097105000000ff0708007f                                (17 bytes)
2021-05-14T10:07:01.619Z SERIAL » [ACK]                                                                   (0x06)
2021-05-14T10:07:01.620Z DRIVER « [Node 006] [REQ] [ApplicationCommand]
                                  └─[NotificationCCReport]
                                      notification type:   Home Security
                                      notification status: 255
                                      notification state:  Motion detection
2021-05-14T10:07:01.621Z CNTRLR   [Node 006] [~] [Notification] Home Security[Motion sensor status] [Endpoint 0]
                                  : 0 => 8
2021-05-14T10:07:01.720Z SERIAL « 0x010a00040006043003ff0c33                                          (12 bytes)
2021-05-14T10:07:01.724Z CNTRLR   [Node 006] [Binary Sensor] Motion: metadata updated               [Endpoint 0]
2021-05-14T10:07:01.733Z CNTRLR   [Node 006] [~] [Binary Sensor] Motion: false => true              [Endpoint 0]
2021-05-14T10:07:01.737Z SERIAL » [ACK]                                                                   (0x06)
2021-05-14T10:07:01.740Z DRIVER « [Node 006] [REQ] [ApplicationCommand]
                                  └─[BinarySensorCCReport]
                                      type:  Motion
                                      value: true
2021-05-14T10:07:09.855Z SERIAL « 0x010f00040004097105000000ff0708007d                                (17 bytes)
2021-05-14T10:07:09.855Z SERIAL » [ACK]                                                                   (0x06)
2021-05-14T10:07:09.856Z DRIVER « [Node 004] [REQ] [ApplicationCommand]
                                  └─[NotificationCCReport]
                                      notification type:   Home Security
                                      notification status: 255
                                      notification state:  Motion detection
2021-05-14T10:07:09.856Z CNTRLR   [Node 004] [~] [Notification] Home Security[Motion sensor status] [Endpoint 0]
                                  : 8 => 8
2021-05-14T10:07:09.877Z SERIAL « 0x010a00040004043003ff0c31                                          (12 bytes)
2021-05-14T10:07:09.878Z CNTRLR   [Node 004] [Binary Sensor] Motion: metadata updated               [Endpoint 0]
2021-05-14T10:07:09.879Z CNTRLR   [Node 004] [~] [Binary Sensor] Motion: true => true               [Endpoint 0]
2021-05-14T10:07:09.880Z SERIAL » [ACK]                                                                   (0x06)
2021-05-14T10:07:09.881Z DRIVER « [Node 004] [REQ] [ApplicationCommand]
                                  └─[BinarySensorCCReport]
                                      type:  Motion
                                      value: true
2021-05-14T10:07:32.740Z SERIAL « 0x0110000400060a7105000000ff0700010862                              (18 bytes)
2021-05-14T10:07:32.741Z SERIAL » [ACK]                                                                   (0x06)
2021-05-14T10:07:32.743Z DRIVER « [Node 006] [REQ] [ApplicationCommand]
                                  └─[NotificationCCReport]
                                      notification type:   7
                                      notification status: 255
                                      notification event:  0x00
                                      event parameters:    
2021-05-14T10:07:32.747Z CNTRLR   [Node 006] [~] [Notification] Home Security[Motion sensor status] [Endpoint 0]
                                  : 8 => 0
2021-05-14T10:07:32.762Z SERIAL « 0x010a00040006043003000ccc                                          (12 bytes)
2021-05-14T10:07:32.763Z CNTRLR   [Node 006] [Binary Sensor] Motion: metadata updated               [Endpoint 0]
2021-05-14T10:07:32.767Z CNTRLR   [Node 006] [~] [Binary Sensor] Motion: true => false              [Endpoint 0]
2021-05-14T10:07:32.770Z SERIAL » [ACK]                                                                   (0x06)
2021-05-14T10:07:32.772Z DRIVER « [Node 006] [REQ] [ApplicationCommand]
                                  └─[BinarySensorCCReport]
                                      type:  Motion
                                      value: false
2021-05-14T10:07:52.589Z SERIAL « 0x0110000400040a7105000000ff0700010860                              (18 bytes)
2021-05-14T10:07:52.590Z SERIAL » [ACK]                                                                   (0x06)
2021-05-14T10:07:52.593Z DRIVER « [Node 004] [REQ] [ApplicationCommand]
                                  └─[NotificationCCReport]
                                      notification type:   7
                                      notification status: 255
                                      notification event:  0x00
                                      event parameters:    
2021-05-14T10:07:52.596Z CNTRLR   [Node 004] [~] [Notification] Home Security[Motion sensor status] [Endpoint 0]
                                  : 8 => 0
2021-05-14T10:07:52.611Z SERIAL « 0x010a00040004043003000cce                                          (12 bytes)
2021-05-14T10:07:52.613Z CNTRLR   [Node 004] [Binary Sensor] Motion: metadata updated               [Endpoint 0]
2021-05-14T10:07:52.616Z CNTRLR   [Node 004] [~] [Binary Sensor] Motion: true => false              [Endpoint 0]
2021-05-14T10:07:52.627Z SERIAL » [ACK]                                                                   (0x06)
2021-05-14T10:07:52.629Z DRIVER « [Node 004] [REQ] [ApplicationCommand]
                                  └─[BinarySensorCCReport]
                                      type:  Motion
                                      value: false
2021-05-14T10:09:22.491Z SERIAL « 0x010c00040005063105030a005198                                      (14 bytes)
2021-05-14T10:09:22.495Z CNTRLR   [Node 005] [Multilevel Sensor] Illuminance: metadata updated      [Endpoint 0]
2021-05-14T10:09:22.502Z CNTRLR   [Node 005] [~] [Multilevel Sensor] Illuminance: 117 => 81         [Endpoint 0]
2021-05-14T10:09:22.503Z SERIAL » [ACK]                                                                   (0x06)
2021-05-14T10:09:22.504Z DRIVER « [Node 005] [REQ] [ApplicationCommand]
                                  └─[MultilevelSensorCCReport]
                                      type:  Illuminance
                                      scale: Lux
                                      value: 81
AlCalzone commented 3 years ago

@deku-m I can't tell what is supposed to be wrong with that log. Notification and Binary Sensor reports for both triggered and idle motion sensor are received.

f45tb00t commented 3 years ago

Just removed the device in the kitchen and readded it, this time unsecure.

By the way. There is a setting I can set with zwavejs2mqtt, but not with the zwave integration in HA, as its not existing. "[9-132-0-wakeUpInterval] Wake Up interval " Standard is 7200 seconds, but I set it to 65534 seconds for all my devices to reduce the battery drain.

I'll keep you updated.

deku-m commented 3 years ago

@f45tb00t I was looking at the value under a motion sensor and i wonder if you have the same under wake-up There is mentioned the value and on the side the controller id. Is that also 0 and yours while the controller (stick) id is maybe 1..?

f45tb00t commented 3 years ago

Good morning @deku-m,

Can you give more details what you mean? You can find the whole logs attached to this incident. Can you show me what you mean? Like compare your and my logs?

By the way. One of the sensors still stuck since yesterday morning. This time not the one in the kitchen, its the floor one downstairs (secured). The third one in the bedroom(secured) works like expected. The one in the kitchen I removed and readded, but with non-secure mode as requested by AlCalzone. It's working fine so far. But without any proof this does not mean much by now, since when I healed the whole network via zwavejs2mqtt it worked for a given amount of time(all sensors secured).

@AlCalzone: Could you please explain a little more detailed why you won't change the behavior like it was before, when the sensor continues working in non-secured included mode?

AlCalzone commented 3 years ago

I should write an FAQ entry for this, since it comes up repeatedly.

Security S0 works by exchanging a so-called "nonce" (number used once) between the nodes, which is used as part of the encryption. This nonce is used to prevent replay attacks and has a short lifespan. Essentially it must be invalidated in the following situations:

Failing to do so circumvents the protection the nonce is meant to serve and leaves you open to replay attacks, e.g. by recording a secure frame to open a lock and sending it again later. Although changing this behavior will probably improve the experience for misbehaving devices and/or bad routing, I won't do it, because it is a security hole.

OZW does not invalidate nonces that have been used (AFAIK it uses a ring buffer to store the issued nonces), so when a device sends two different commands encrypted with the same nonce, it will gladly accept them.

Now, this doesn't mean per se that your device is doing something wrong - it might receive a re-transmitted version of the first nonce and think it is a response to its second requrest.

deku-m commented 3 years ago

Good morning @deku-m,

Can you give more details what you mean? You can find the whole logs attached to this incident. Can you show me what you mean? Like compare your and my logs?

By the way. One of the sensors still stuck since yesterday morning. This time not the one in the kitchen, its the floor one downstairs (secured). The third one in the bedroom(secured) works like expected. The one in the kitchen I removed and readded, but with non-secure mode as requested by AlCalzone. It's working fine so far. But without any proof this does not mean much by now, since when I healed the whole network via zwavejs2mqtt it worked for a given amount of time(all sensors secured).

@AlCalzone: Could you please explain a little more detailed why you won't change the behavior like it was before, when the sensor continues working in non-secured included mode?

Under you node in zwavejs-mqtt controlpanel you can view the options regarding the devices. afbeelding Under wake-up you see controller-ID this is 0 at my side. Im just wondering if this is different at your config. And if this is default. Not sure if this change when it wake-up and set controller-id

Also can find it in debug under wake-up i think.

f45tb00t commented 3 years ago

@AlCalzone: Okay - got it. Makes absolutely sense from a security perspective. But you changed this behavior recently? Because like said with an older version it worked before without issues. If so, surely it explains the issues me and others might have now with their devices.

Do you have any recommendation what to do? My first guess is to get in touch with Fibaro. Because I consider even my motion sensors as secure devices and want them to use encrypted transfers as well. Not sure how responsive Fibaro is in such things. Some companies don't give a ...., others act very user friendly and try to help. Especially since the Fibaro devices are not really cheap, I would like to get some service for that.

AlCalzone commented 3 years ago

But you changed this behavior recently?

No. My guess would be that something changed RF wise - new/removed/moved devices, changed routes, whatever that causes the Z-Wave protocol to transmit messages differently. You would need a Zniffer to debug that though.

Not sure how responsive Fibaro is in such things.

I don't know. Assuming that there is some unwanted retransmission happening, the device might be doing the right thing as far as it can tell. I'll investigate if I can reduce the risk of the retransmissions by changing the command flags these nonce reports are sent with.

f45tb00t commented 3 years ago

@AlCalzone: So I've tested it now with unsecure mode. I was just wondering why the kitchen light is still on... I've checked the sensor and it stucks again. Even in unsecure mode. So this cannot be the issue then I assume.

kpine commented 3 years ago

Any improvement if you associate the controller to group 3?

f45tb00t commented 3 years ago

Any improvement if you associate the controller to group 3?

To group what?

kpine commented 3 years ago

Group 3 Tamper

f45tb00t commented 3 years ago

Group 3 Tamper

I don't know what you mean.

kpine commented 3 years ago

Does the device show any association groups?

f45tb00t commented 3 years ago

Checked in zwavejs2mqtt. No group for the controller and for the nonsecure device(fibaro motion sensor):

Endpoint Group Node Target Endpoint 0 Lifeline NodeID_1 None

kpine commented 3 years ago

The device supports 5 different association groups. Group 3 is for Tamper alarms and tamper idle notifications. Maybe something is funky with this device and tamper notifications are not reliably sent via Group 1 (Lifeline), or maybe tamper idle notifications are never sent. My suggestion was to associate Group 3 to the controller to see if there's an improvement, since it should then definitely receive tamper notifications.

The <= 2.8 fw version of the device has 3 groups, and zwave-js auto-assigns Group 3 (tamper) to the controller. Group 3 is the same function in both versions. Apparently zwave-js does not auto-discover these groups, so you would need to modify the device config file to add them.

It looks like OZW and a SmartThings device handler auto-configure this device to associate Group 3 with the controller; neither of those software have the smarts to detect firmware versions, so the association is applied blindly to all versions with the same product ID.

f45tb00t commented 3 years ago

All of the Fibaros hav v3.3

I checked to add a group (on the controller), but it seems I cannot do that.

  1. No idea what to add under Node Endpoint: no endpoint or endpoint 0
  2. Group: no data available
  3. Target node: would be the fibaro sensors I guess

EDIT: Now checked one of the sensors

  1. Node Endpoint: endpoint 0
  2. Then I see under group a lot of settings. (Lifeline, Motion, Tamper, Motion BC, Temper BC)
  3. Target Node?
  4. Target Endpoint?
kpine commented 3 years ago

Yes, you are supposed to set the group on the sensor. Target node is controller. Target endpoint is just blank I think.

f45tb00t commented 3 years ago

Tried to add it. But it won't show up after adding. Also tried with Target endpoint (Endpoint 0). But won't show up at all in the groups of that device, nor on the controller under groups

kpine commented 3 years ago

You probably need to manually wake it up, like setting a config parameter.

f45tb00t commented 3 years ago

yes, that was it. Done. But guess I can check the result Sunday or Monday as I will leave tomorrow. I keep the Lifeline group? Now I have two groups set

kpine commented 3 years ago

Leave the Lifeline group as-is. Will you have debug logging to file enabled?

f45tb00t commented 3 years ago

I just disabled it again yesterday. But I'll enable it again.

By the way. I had the sensor in my living room to wake it up. So that means I was like 2 meters far from the controller. But the status still stuck.

Edit: now the motion sensor went back to idle mode.

AlCalzone commented 3 years ago

it stucks again. Even in unsecure mode. So this cannot be the issue then I assume.

No, then the messages are really not reaching the controller sometimes.

f45tb00t commented 3 years ago

No, then the messages are really not reaching the controller sometimes.

I don't get it. Coze this makes no sense. When I woke up the sensor yesterday because of adding the new group, it was less far away than two meters, with a direct line of sight. The Mini PC with the stick was directly next to me and the sensor. But the status still stuck for motion, the other sensors were updated. Like temper and lux.

By now the sensor in the kitchen is fine it seems. This time the one in my bedroom stucks. Just tested testd the sensor in the floor downstairs, also stuck.

f45tb00t commented 3 years ago

Hello,

meanwhile with the recent updates I did not see the issue occured again. So I've still the assumption that something changed in the code again which fixed the issue.

Either way this can be closed to far.

Thanks for your support in this.