After a binary switch is dead, we can not recover it

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1.Add a binary switch and control it. It's OK.
2.Remove the power of the switch and wait for a couple of mninutes until it is 
declared as dead.
3. Put the power of binary switch back. We can not control it any more. All 
commands are dropped.

I try to send node info from the b8inary switch. The OZW receive it. However, 
it does nothing to bring it out of the dead state.

Original issue reported on code.google.com by wyc...@gmail.com on 14 Mar 2013 at 9:22

Attachments:

GoogleCodeExporter commented 9 years ago

I have the same issue

Original comment by vinc...@besson.be on 27 Mar 2013 at 8:23

GoogleCodeExporter commented 9 years ago

Node info didn't revive dead nodes. The latest version handles this case. 
Please confirm back to this report whether it works for your case.

Original comment by glsatz on 16 Apr 2013 at 4:44

GoogleCodeExporter commented 9 years ago

Assuming this one is complete.

Original comment by glsatz on 24 Apr 2013 at 4:05

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Issue 185 has been merged into this issue.

Original comment by glsatz on 24 Apr 2013 at 4:07

GoogleCodeExporter commented 9 years ago

I am having this issue using a Zstick from aeon labs and a Fibaro Wall Plug. If 
I init the library without the Fibaro plug connected, it will never recover 
from Dead status. (I receive value changed event but can't set value). 

Is there any thing I need to do in order to force a node to wake up after he is 
on the network?

Original comment by nuno.ny...@gmail.com on 9 Aug 2013 at 4:00

GoogleCodeExporter commented 9 years ago

I also have this issue with a Zstick and a Fibaro switch. 

I started open-zwave with the switch unplugged. Then I plugged in the switch.

open-zwave receives value changes from the switch but does not send commands to 
it as it remains presumed dead.

So I believe this one should be reopened.

-sbi

Original comment by stephane...@gmail.com on 11 Aug 2013 at 10:59

GoogleCodeExporter commented 9 years ago

I have this issue, too. However, I'm using Fibaro Wall Plug with the Zstick on 
openremote. I am not sure if OpenRemote is using OpenZwave, too? But the issue 
is similar.

Original comment by rmarob...@gmail.com on 2 Nov 2013 at 9:18

GoogleCodeExporter commented 9 years ago

Try this patch and let me know what happens:

Index: cpp/src/Driver.cpp
===================================================================
--- cpp/src/Driver.cpp  (revision 676)
+++ cpp/src/Driver.cpp  (working copy)
@@ -3264,6 +3264,11 @@
                {
                        node->m_receivedUnsolicited++;
                }
+
+               if( !node->IsNodeAlive() )
+               {
+                       node->SetNodeAlive( true );
+               }
        }
        if( ApplicationStatus::StaticGetCommandClassId() == classId )
        {

Original comment by glsatz on 3 Nov 2013 at 6:10

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Tried the patch and it works now thank you very much.

One thing thou, with this patch a node will never be marked as dead, am I 
correct?

Original comment by nuno.ny...@gmail.com on 6 Nov 2013 at 1:40

GoogleCodeExporter commented 9 years ago

This issue was closed by revision r690.

Original comment by jus...@dynam.ac on 14 Nov 2013 at 2:46

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

patch applied to the 2013-11-13_release_testing branch.

Original comment by jus...@dynam.ac on 14 Nov 2013 at 2:47

GoogleCodeExporter commented 9 years ago

I'm seeing this same issue running the current source code. I have verified 
that the patch is in fact in my source base.

2014-05-07 18:17:15.208 Node003, Sending (NoOp) message (Callback ID=0x02, 
Expected Reply=0x13) - NoOperation_Set (Node=3): 0x01, 0x09, 0x00, 0x13, 0x03, 
0x02, 0x00, 0x00, 0x25, 0x02, 0xc3
2014-05-07 18:17:15.216 Node003,   Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8
2014-05-07 18:17:15.219 Node003,   ZW_SEND_DATA delivered to Z-Wave stack
2014-05-07 18:17:15.353 Node003,   Received: 0x01, 0x05, 0x00, 0x13, 0x02, 
0x01, 0xea
2014-05-07 18:17:15.357 Node003,   ZW_SEND_DATA Request with callback ID 0x02 
received (expected 0x02)
2014-05-07 18:17:15.360 Node003, WARNING: ZW_SEND_DATA failed. No ACK received 
- device may be asleep.
2014-05-07 18:17:15.365 Node003, WARNING: Device is not a sleeping node.
2014-05-07 18:17:15.368 Node003, QueryStageRetry stage Probe requested stage 
Probe max 3 retries 1 pending 1
2014-05-07 18:17:15.372 Node003,   Expected reply was received
2014-05-07 18:17:15.376 Node003,   Message transaction complete
2014-05-07 18:17:15.380 
2014-05-07 18:17:15.384 Node003, Removing current message
2014-05-07 18:17:15.388 Node003, Query Stage Complete (Probe)
2014-05-07 18:17:15.391 Node003, AdvanceQueries queryPending=0 queryRetries=2 
queryStage=Probe live=1
2014-05-07 18:17:15.395 Node003, QueryStage_Probe
2014-05-07 18:17:15.399 Node003, NoOperation::Set - Routing=true
2014-05-07 18:17:15.403 Node003, Queuing (NoOp) NoOperation_Set (Node=3): 0x01, 
0x09, 0x00, 0x13, 0x03, 0x02, 0x00, 0x00, 0x25, 0x03, 0xc2
2014-05-07 18:17:15.407 Node003, Queuing (Query) Query Stage Complete (Probe)
2014-05-07 18:17:15.411 
2014-05-07 18:17:15.414 Node003, Sending (NoOp) message (Callback ID=0x03, 
Expected Reply=0x13) - NoOperation_Set (Node=3): 0x01, 0x09, 0x00, 0x13, 0x03, 
0x02, 0x00, 0x00, 0x25, 0x03, 0xc2
2014-05-07 18:17:15.423 Node003,   Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8
2014-05-07 18:17:15.426 Node003,   ZW_SEND_DATA delivered to Z-Wave stack
2014-05-07 18:17:15.599 Node003,   Received: 0x01, 0x05, 0x00, 0x13, 0x03, 
0x01, 0xeb
2014-05-07 18:17:15.602 Node003,   ZW_SEND_DATA Request with callback ID 0x03 
received (expected 0x03)
2014-05-07 18:17:15.606 Node003, WARNING: ZW_SEND_DATA failed. No ACK received 
- device may be asleep.
2014-05-07 18:17:15.609 Node003, WARNING: Device is not a sleeping node.
2014-05-07 18:17:15.614 Node003, ERROR: node presumed dead
2014-05-07 18:17:15.617 CheckCompletedNodeQueries m_allNodesQueried=0 
m_awakeNodesQueried=0
2014-05-07 18:17:15.621 CheckCompletedNodeQueries all=1, deadFound=1 
sleepingOnly=1
2014-05-07 18:17:15.625          Node query processing complete except for dead 
nodes.
2014-05-07 18:17:15.629 Node003, QueryStageRetry stage Probe requested stage 
Probe max 3 retries 2 pending 1
2014-05-07 18:17:15.633 Node003,   Expected reply was received
2014-05-07 18:17:15.636 Node003,   Message transaction complete

Original comment by wesw...@gmail.com on 8 May 2014 at 1:31

GoogleCodeExporter commented 9 years ago

Same for me. Should we open a new issue ?

Original comment by xavier.m...@gmail.com on 26 Jun 2014 at 4:20

GoogleCodeExporter commented 9 years ago

please attach a complete LogFile to this so I can see whats going on.

Ideally, let OZW run for a while after it "pressumes" the node is dead.

Original comment by jus...@dynam.ac on 30 Jun 2014 at 2:20

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Finally getting back to this. Here is a fresh log. I deleted the 
zwcfg_0x016a31a6.xml file and the OZW_Log.txt before this run so it did a 
complete detect of the nodes. I am using the OZWForm application and version 
1.0.791 of OpenZWave. Node #2 is believed to be dead and I have no idea why. 
Furthermore I do not know how to make it usable again. Any help would be great. 
Thanks.

Original comment by wesw...@gmail.com on 16 Aug 2014 at 6:15

Attachments:

GoogleCodeExporter commented 9 years ago

I am also facing the same issue. I see that the above patch works for Fibaro 
wallplug
but not for Fibaro dimmer/relay. The Fibaro wallplug sends events when turned 
on but from where the node is made alive by above patch, but this is not true 
for dimmer/relay. These devices donot send any events. Is there a way we can 
solve this problem.

Thanks,
vibhav

Original comment by Vibhav.B...@gmail.com on 7 Oct 2014 at 4:22

GoogleCodeExporter commented 9 years ago

I'm experiencing the same issue (OZW 1.2.919).
Happened today with "FIBARO System FGRGBWM441 RGBW Controller". I powered off 
the switch, after some hours I restored the power of the switch, but for OZW 
the switch is still dead. Only restarting the controller (and waiting near 30 
minutes for OWZ initialize the ZWave network) make it works again.

Problematic node: Node060 (0x3c)

Kind Regards

Original comment by ugo.v...@gmail.com on 18 Nov 2014 at 8:35

Attachments:

OZW_Log.txt.bz2

GoogleCodeExporter commented 9 years ago

Looking at the issue, it seems the open-zwave maybe should send a regular NOP 
request for the list of dead nodes. This is the only way to detect if a node is 
again again, when it never sends a z-wave event itself (like the wall plug).

Original comment by uAle...@gmail.com on 18 Nov 2014 at 8:45

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Your comment is completely useless other than what I consider a rant. Logs, 
detailed descriptions how to reproduce the issue, or suggestions would help 
much more than complaining. If you feel OZW is a toy feel free to go back to 
the razberry software, or Homeseer, or Home Control or the dozens of other 
commercial options that are available. 

Or, instead of insulting the devs of OZW, you could have politely asked for 
help or suggestions. Personally, I'd be saying if outages are a concern, you 
shouldn't be relying on automation alone.

Original comment by jus...@dynam.ac on 9 Mar 2015 at 3:59

GoogleCodeExporter commented 9 years ago

Attached my logs.
For me the Fibaro Wall plugs are not recovered once being dead.

Using the system: http://www.Pulse-Station.com

Log excerpt:

So I first turned the light off, that worked, it was at 23:33:14.

2015-03-08 23:33:14.040 Info, mgr,     Manager::WriteConfig completed for 
driver with home ID of 0xeaafcc7f
2015-03-08 23:33:14.313 Info, Node002, Value::Set - COMMAND_CLASS_SWITCH_BINARY 
- Switch - 0 - 1 - False
2015-03-08 23:33:14.313 Info, Node002, SwitchBinary::Set - Setting node 2 to Off
2015-03-08 23:33:14.313 Detail, Node002, Queuing (Send) SwitchBinaryCmd_Set 
(Node=2): 0x01, 0x0a, 0x00, 0x13, 0x02, 0x03, 0x25, 0x01, 0x00, 0x25, 0x4e, 0xa8
2015-03-08 23:33:14.313 Detail, Node002, Queuing (Send) SwitchBinaryCmd_Get 
(Node=2): 0x01, 0x09, 0x00, 0x13, 0x02, 0x02, 0x25, 0x02, 0x25, 0x4f, 0xa8
2015-03-08 23:33:14.313 Detail, 
2015-03-08 23:33:14.313 Info, Node002, Sending (Send) message (Callback 
ID=0x4e, Expected Reply=0x13) - SwitchBinaryCmd_Set (Node=2): 0x01, 0x0a, 0x00, 
0x13, 0x02, 0x03, 0x25, 0x01, 0x00, 0x25, 0x4e, 0xa8
2015-03-08 23:33:14.321 Detail, Node002,   Received: 0x01, 0x04, 0x01, 0x13, 
0x01, 0xe8
2015-03-08 23:33:14.321 Detail, Node002,   ZW_SEND_DATA delivered to Z-Wave 
stack
2015-03-08 23:33:14.338 Detail, Node002,   Received: 0x01, 0x07, 0x00, 0x13, 
0x4e, 0x00, 0x00, 0x02, 0xa7
2015-03-08 23:33:14.338 Detail, Node002,   ZW_SEND_DATA Request with callback 
ID 0x4e received (expected 0x4e)
2015-03-08 23:33:14.338 Info, Node002, Request RTT 24 Average Request RTT 24
2015-03-08 23:33:14.338 Detail, Node002,   Expected reply was received
2015-03-08 23:33:14.338 Detail, Node002,   Message transaction complete
2015-03-08 23:33:14.338 Detail, 
2015-03-08 23:33:14.338 Detail, Node002, Removing current message
2015-03-08 23:33:14.338 Detail, 
2015-03-08 23:33:14.338 Info, Node002, Sending (Send) message (Callback 
ID=0x4f, Expected Reply=0x04) - SwitchBinaryCmd_Get (Node=2): 0x01, 0x09, 0x00, 
0x13, 0x02, 0x02, 0x25, 0x02, 0x25, 0x4f, 0xa8
2015-03-08 23:33:14.346 Detail, Node002,   Received: 0x01, 0x04, 0x01, 0x13, 
0x01, 0xe8
2015-03-08 23:33:14.346 Detail, Node002,   ZW_SEND_DATA delivered to Z-Wave 
stack
…

I then switched it on again the same second 23:33:15 :

2015-03-08 23:33:15.073 Info, mgr,     Manager::WriteConfig completed for 
driver with home ID of 0xeaafcc7f
2015-03-08 23:33:15.797 Info, Node002, Value::Set - COMMAND_CLASS_SWITCH_BINARY 
- Switch - 0 - 1 - True
2015-03-08 23:33:15.797 Info, Node002, SwitchBinary::Set - Setting node 2 to On
2015-03-08 23:33:15.797 Detail, Node002, Queuing (Send) SwitchBinaryCmd_Set 
(Node=2): 0x01, 0x0a, 0x00, 0x13, 0x02, 0x03, 0x25, 0x01, 0xff, 0x25, 0x50, 0x49
2015-03-08 23:33:15.797 Detail, Node002, Queuing (Send) SwitchBinaryCmd_Get 
(Node=2): 0x01, 0x09, 0x00, 0x13, 0x02, 0x02, 0x25, 0x02, 0x25, 0x51, 0xb6
[09/03/15 08:15:47] Koen Rens: and then it starts failing:
2015-03-08 23:33:21.001 Info, mgr,     Manager::WriteConfig completed for 
driver with home ID of 0xeaafcc7f
2015-03-08 23:33:21.265 Detail, Node002,   Received: 0x01, 0x07, 0x00, 0x13, 
0x4f, 0x01, 0x02, 0xb4, 0x13
2015-03-08 23:33:21.266 Detail, Node002,   ZW_SEND_DATA Request with callback 
ID 0x4f received (expected 0x4f)
2015-03-08 23:33:21.266 Info, Node002, WARNING: ZW_SEND_DATA failed. No ACK 
received - device may be asleep.
2015-03-08 23:33:21.266 Warning, Node002, WARNING: Device is not a sleeping 
node.

2015-03-08 23:32:01.502 Info, Node002, Received reply to 
FUNC_ID_ZW_GET_ROUTING_INFO
2015-03-08 23:32:01.502 Info, Node002,     Neighbors of this node are:
2015-03-08 23:32:01.502 Info, Node002,     Node 1
2015-03-08 23:32:01.502 Info, Node002,     Node 3
2015-03-08 23:32:01.502 Info, Node002,     Node 4
2015-03-08 23:32:01.502 Info, Node002,     Node 5
2015-03-08 23:32:01.502 Info, Node002,     Node 6
2015-03-08 23:32:01.502 Info, Node002,     Node 7
2015-03-08 23:32:01.502 Detail, Node002,   Expected reply was received
2015-03-08 23:32:01.502 Detail, Node002,   Message transaction complete
2015-03-08 23:32:01.502 Detail, 
2015-03-08 23:32:01.502 Detail, Node002, Removing current message
=> If it can find so many neighbours, could it then still be this plug is too 
far from the usb key controller?

The devices are working for 4-5 days untill one is dead, then they all start 
becoming dead quite quickly (next time I use any switch, they all turn into 
dead).

Original comment by koen.r...@gmail.com on 9 Mar 2015 at 10:23

GoogleCodeExporter commented 9 years ago

the problem with log snippets is too much is omitted. There are lots of 
reasons, unfortunately, that nodes report dead. RF interference, RF distance is 
marginal, bugs in the z-wave code or even the library, poorly laid out 
topology. There are also node firmware incompatibilities depending on the 
version of the SDK the node was implemented in. Everyone gets to figure out 
which is the problem. There is no way for anyone to figure this out without 
hands on access and experimentation.

The reason dead nodes was implemented is to prevent the case where the dead 
node isn't going to respond and the library continues to try to send data 
preventing all the other working nodes from getting timely response.

Raising the dead is, unfortunately, an exercise left for the reader.

My approach is to make sure your network is reliable and functional. Is the 
topology rational. Are there multiple paths to any one node? If not how 
reliable is the single source paths? How much can you figure out about RF 
interference? The round trip times (RTT) are key. Variable RTTs are indicative 
of interference or some pathing limits, maybe distance exceeded. If RTTs are 
consistent and the nodes just fail, and if they fail on roughly the same time 
period, you could be seeing a node firmware bug. Is there new firmware for 
these devices? Have you checked with the manufacturer to see what they think? 
Do all the nodes fail and if not then you get to figure out what the difference 
is. Sometimes the nodes along the path might be problematic. Could even be a 
controller firmware issue.

Nodes that never send a message back to the controller will never be revived. 
How does the library know a node returns from the dead? The library can be used 
to send a "ping" message using Manager::TestNetworkNode. This routine should 
send a message to a dead node whereas the rest of the library won't. If the 
message succeeded from the controllers point of view at the z-wave protocol 
level, the controller command "has node failed" should return true and when it 
does, it should ressurect the dead node and the library should start talking to 
it again.

We can't know all these issues with someone's network no matter how smart/good 
we might or might not be. Dead nodes are an exercise left for the user.

Original comment by glsatz on 10 Mar 2015 at 1:20

roshbaik2 / open-zwave

After a binary switch is dead, we can not recover it #190