wburgers / Open-Zwave-Socket-Server

Open-Zwave Socket Server
http://willemburgers.nl/zwave/open-zwave-socket-server
9 stars 8 forks source link

server hangs on start when node seems to be asleep #7

Closed metraport closed 8 years ago

metraport commented 8 years ago

Hello Willem,

i have an other issue on starting OZWS. A paired Danfoss living Connect is not responding at a new startup of OZWS. the Server is blocked and can't be started anymore. The only way to stop it, is a kill of the process.

Do you have an Idea how to remove the not responding node or set the Controller in remove Mode on startup ?

2015-09-22 15:24:23.784 Info, Node006, Sending (NoOp) message (Callback ID=0x0a, Expected Reply=0x13) - NoOperation_Set (Node=6): 0x01, 0x09, 0x00, 0x13, 0x06, 0x02, 0x00, 0x00, 0x25, 0x0a, 0xce 2015-09-22 15:24:23.795 Detail, Node006, Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8 2015-09-22 15:24:23.795 Detail, Node006, ZW_SEND_DATA delivered to Z-Wave stack 2015-09-22 15:24:23.884 Detail, Node006, Received: 0x01, 0x07, 0x00, 0x13, 0x0a, 0x01, 0x00, 0x0a, 0xea 2015-09-22 15:24:23.884 Detail, Node006, ZW_SEND_DATA Request with callback ID 0x0a received (expected 0x0a) 2015-09-22 15:24:23.884 Info, Node006, WARNING: ZW_SEND_DATA failed. No ACK received - device may be asleep. 2015-09-22 15:24:23.885 Info, Node006, Node 6 has been marked as asleep 2015-09-22 15:24:23.885 Info, Node006, Node not responding - moving QueryStageComplete command to Wake-Up queue 2015-09-22 15:24:23.886 Detail, Node006, Notification: Notification - NoOperation 2015-09-22 15:24:23.886 Detail, Node006, Notification: Notification - Node Asleep

wburgers commented 8 years ago

Hi Mike,

I also have a Danfoss living connect radiator valve. This is a battery operated device, so it is not always on. The device decides when it wants to wake up from sleep. During startup, OZW (so not OZWSS) is trying to get the latest state from the devices. Some devices may not be awake during startup, so their state discovery is postponed to when they wake up. ("moving QueryStageComplete command to Wake-Up queue").

These messages should not have any impact on the startup of the server. Are you sure that the startup is blocked? For how long? Does the process continue after a timeout?

In older versions of OZW, the timeout for a message that could not be sent, was 40(!) seconds. This is reduced to 10 seconds I believe in one of the more recent versions. ("#define RETRY_TIMEOUT 10000" in defs.h of OZW).

The only drawback is that the state of the device may be wrong when you ask for the device state (with the ALIST command to OZWSS for example). When the device wakes up, this is solved.

Let me know if this clears things up for you.

Thanks for your report!

Best regards, Willem

metraport commented 8 years ago

Hi Willem,

the server is blocked for the whole time until i send a kill. I have waited for one hour with no response.

I have no chance to send a command because the Websocket was not started before.

But the Danfoss device is very strange. I have reseted now the the Controller and after reinclusion of the device, it seems to send data in a loop back to controller, until i take out the batteries.

wburgers commented 8 years ago

Hi Mike,

Sorry I didn't get back to you yesterday.

That is strange indeed. What kind of messages are sent in the loop? Is it the wake-up interval? Can you post a full log of the hanging startup over here?

What version of the danfoss valve are you using? My first danfoss valve was the old version (lc12), which would freeze every now and then. That is why I bought a second valve. A newer version, which works perfectly (lc13).

metraport commented 8 years ago

Hi Willem,

it's a LC-13.

In inclusion Mode the loop already started, here are one snippet of the message in loop:

2015-09-23 17:00:05.821 Detail, Node004, Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8 2015-09-23 17:00:05.821 Detail, Node004, ZW_SEND_DATA delivered to Z-Wave stack 2015-09-23 17:00:05.839 Detail, Node004, Received: 0x01, 0x07, 0x00, 0x13, 0xad, 0x00, 0x00, 0x03, 0x45 2015-09-23 17:00:05.839 Detail, Node004, ZW_SEND_DATA Request with callback ID 0xad received (expected 0xad) 2015-09-23 17:00:05.840 Info, Node004, Request RTT 27 Average Request RTT 26 2015-09-23 17:00:05.840 Detail, Expected callbackId was received 2015-09-23 17:00:05.840 Detail, Expected reply was received 2015-09-23 17:00:05.840 Detail, Message transaction complete 2015-09-23 17:00:05.840 Detail, 2015-09-23 17:00:05.840 Detail, Node004, Removing current message 2015-09-23 17:00:05.841 Detail, 2015-09-23 17:00:05.841 Info, Node004, Sending (Send) message (Callback ID=0xac, Expected Reply=0x04) - WakeUpCmd_IntervalGet (Node=4): 0x01, 0x09, 0x00, 0x13, 0x04, 0x02, 0x84, 0x05, 0x25, 0xac, 0xeb 2015-09-23 17:00:05.849 Detail, Node004, Received: 0x01, 0x04, 0x01, 0x13, 0x01, 0xe8 2015-09-23 17:00:05.850 Detail, Node004, ZW_SEND_DATA delivered to Z-Wave stack 2015-09-23 17:00:05.866 Detail, Node004, Received: 0x01, 0x07, 0x00, 0x13, 0xac, 0x00, 0x00, 0x02, 0x45 2015-09-23 17:00:05.866 Detail, Node004, ZW_SEND_DATA Request with callback ID 0xac received (expected 0xac) 2015-09-23 17:00:05.866 Info, Node004, Request RTT 24 Average Request RTT 25 2015-09-23 17:00:05.866 Detail, Expected callbackId was received 2015-09-23 17:00:05.878 Detail, Node004, Received: 0x01, 0x0c, 0x00, 0x04, 0x00, 0x04, 0x06, 0x84, 0x06, 0x00, 0x03, 0x84, 0x01, 0xf1 2015-09-23 17:00:05.879 Detail, 2015-09-23 17:00:05.879 Info, Node004, Response RTT 37 Average Response RTT 36 2015-09-23 17:00:05.879 Info, Node004, Received Wakeup Interval report from node 4: Interval=900, Target Node=1 2015-09-23 17:00:05.879 Detail, Node004, Refreshed Value: old value=900, new value=900, type=int 2015-09-23 17:00:05.879 Detail, Node004, Changes to this value are not verified 2015-09-23 17:00:05.880 Detail, Node004, Expected reply and command class was received 2015-09-23 17:00:05.880 Detail, Node004, Message transaction complete 2015-09-23 17:00:05.880 Detail, 2015-09-23 17:00:05.880 Detail, Node004, Removing current message 2015-09-23 17:00:05.880 Detail, Node004, Notification: ValueChanged 2015-09-23 17:00:05.881 Info, Node004, Value::Set - COMMAND_CLASS_WAKE_UP - Wake-up Interval - 0 - 1 - 3600 2015-09-23 17:00:05.881 Detail, Node004, Queuing (WakeUp) WakeUpCmd_IntervalSet (Node=4): 0x01, 0x11, 0x00, 0x13, 0x04, 0x0a, 0x8f, 0x01, 0x01, 0x06, 0x84, 0x04, 0x00, 0x0e, 0x10, 0x01, 0x25, 0xaf, 0x6f 2015-09-23 17:00:05.881 Detail, Node004, Queuing (Send) WakeUpCmd_IntervalGet (Node=4): 0x01, 0x09, 0x00, 0x13, 0x04, 0x02, 0x84, 0x05, 0x25, 0xb0, 0xf7 2015-09-23 17:00:05.882 Detail, 2015-09-23 17:00:05.882 Info, Node004, Sending (WakeUp) message (Callback ID=0xaf, Expected Reply=0x13) - WakeUpCmd_IntervalSet (Node=4): 0x01, 0x11, 0x00, 0x13, 0x04, 0x0a, 0x8f, 0x01, 0x01, 0x06, 0x84, 0x04, 0x00, 0x0e, 0x10, 0x01, 0x25, 0xaf, 0x6

It seems to be, that open-zwave not responding right to LC-13.

Best regards,

Mike

metraport commented 8 years ago

I have tested it with razberry and Z-way, there is all fine after including.

wburgers commented 8 years ago

Hi Mike,

That loop is part of ozwss, not ozw. I have a temp sensor device that resets the wake-up interval when I replace the batteries. I made ozwss keep track of this setting, to only update when the user sends a command and not when the device wants to reset. (By sending the cached value back to the device.) This works fairly well, but for some reason gets caught in a loop sometimes. I believe that a manual wake-up of the device (by pressing the button on the device) may work to break the loop. Also, it may not be visible due to the massive amounts of output, but the server should start in this case, so you should be able to send a command. Sending 'SETNODE~6~Wake-up Interval=900' or whatever value you like, should also break the loop. I will try to recreate the loop conditions and hopefully fix it, but in the meantime, above options should be enough to help you with this issue.

Let me know!

Best regards, Willem

metraport commented 8 years ago

Sending 'SETNODE~6~Wake-up Interval=900' and reinsert batteries fixed the loop problem.

Thank you very much for this hint.

wburgers commented 8 years ago

I am glad that worked. I will still try to reproduce this problem and fix it eventually.

Thanks for your help!

metraport commented 8 years ago

So it will be a good idea, to send a 'Wake-up Interval=900' as standard after each inclusion process. What do you mean ?

wburgers commented 8 years ago

That should not be necessary.

Zwave devices that operate on batteries have a default wake-up interval. On startup of OZWSS, the server asks the device what the wake-up interval is currently and saves that. Only if a user updates the wake-up interval, will the server accept changes.

For some reason, the wake-up interval cache does not want a new device to set its default value as cached value.

So when I fix this bug, you should not see a loop at all and you don't have to send a command to exit the loop. It is not fixed yet, so if you see a loop of wake-up interval messages now, you should send the command for the time being to break the loop. This is just a workaround and not a fix for the problem.

Hope this clears things up.

Best regards, Willem

metraport commented 8 years ago

Sorry Willem , i have an other question ;-)

By starting the Websocket Client with your sample, the sended date were recogmized bei server.

WebSocket connection established Received websocket data: ALIST Received websocket data: ROOMLIST

But there will be no data sended back to client.

And additional another question, Is there a way to get the live data, ( e.g. a window contact was opened or a temperature change was sended from sensor ) without permanently send an ALIST Command ?

Thank you very much for your help,

Best regards,

Mike

wburgers commented 8 years ago

Hi Mike,

Please don't be sorry, I really appreciate your questions! This helps me improve OZWSS!

One request from my side though: Could you please create separate issues for these questions, such that we can discuss, track, answer and solve them individually.

To give you a quick answer already: 1a) I don't update the Websocket Client anymore. This client was a basic test for websockets, but I was very unhappy with my JQuery skills to create a nice looking client :-P. I came across Polymer and merged the functionality of the Websocket Client with the app layout of Polymer. The Polymer Client looks way nicer and works on both desktop and mobile, so that was covered as well. If you like the Websocket Client and want to use that, any update is welcome and please do a pull request ;-). I will only update the Polymer Client from my side.

1b) To fully answer your question: I changed the protocol on the server side to send back JSON messages to the client. The Websocket Client still expects the old format of messages, which had to be parsed manually and had no real error handling. The JSON messages can include error messages and other textual messages. The Polymer Client can handle these JSON messages, so if you want an example and improve the Websockets Client, that is a good starting place to look at.

2) No that is not possible (yet). If you want to, you can update the code for the UPDATE command that is sent from server to client. You can include per device changes in the message and parse them in the client to update the client status. I didn't have time to implement this yet, but an update without sending the complete state of the whole network is definitely an improvement. Again, a pull request is very welcome :-).

Best Regards, Willem

wburgers commented 8 years ago

So, after a month I finally had some time to determine what the problem is. This comment is more like a mental note...

When a device is added, all value instances are added to the nodeinfo struct. This also triggers the recreation of the wake-up interval cache. The value of the wake-up interval is read and stored in cache (value of 3600).

Then, the initial value from the device is read (value of 300). The cache does not agree with this "new" value and tries to set it back to the cached value. However, the maximal value of the wake-up interval is 900. So the cache keeps on trying to set a value higher than the max, so it keeps on looping. I have not found any other ways to get the cache looping.

I hope to work on a solution real soon. It should not be that hard to find the max value and not send a higher value to the device. I also want to postpone the cache recreation, such that it won't read 3600 (no idea why it does that).

wburgers commented 8 years ago

Hi Mike,

Please check the latest commit. I'm sorry it took so long. I've been pretty busy with other stuff. Let me know if you find any other issues!

Best regards, Willem