Closed mundschenk-at closed 1 year ago
👋 Hey @mundschenk-at!
It looks like you attached a logfile, but its filename doesn't look like it a driver log that came from Z-Wave JS. Please make sure you upload the correct one.
It looks like you attached a logfile, but its filename doesn't look like it a driver log.
It is.
For me, the logs look like the nodes don't get the expected acknowledgement from the controller and retry sending. Also there are some instances where the controller can't reach a node while the same node is sending reports to the controller, which would support my suspicion further.
I don't see any mention of you doing a network heal. Even if the backup restores the known routes, the physical differences between sticks may result in incorrect routes being used for attempting communication with the nodes.
I had not done a network heal when that log file was created, no. I've since started one, takes some time for the battery powered nodes. I'll upload tomorrow's logfile which should be mostly after the healing (only the smoke detector is not done yet, not sure when it's scheduled for its daily wakeup).
Addendum: According to the network graph, range still appears to be much worse with 700 series (though it's been advertised as having an improved range). I think I mentioned this previously in another ticket, I've a Strips Guard 700 on a window in the same room as the controller, maybe 5 m distance that takes 2 hops with the 700 series and only 1 hop with the 5gen+. This might explain the issues before the network heal (but not why range is so abysmal with the newer stick).
Here's today's log, @AlCalzone. The network heal finished at 01:00 with the last remaining node (45 - the others were healed before the log turnover), but unfortunately to me it looks very similar (lots of invalid payloads/duplicate commands): zwavejslog_20220313.zip
Not sure if we went through this already - did you follow the instructions for controller placement mentioned here? https://zwave-js.github.io/node-zwave-js/#/troubleshooting/connectivity-issues?id=general-troubleshooting
We didn't, but I know about it and the stick's attached to a 1,5 m extension cord (above a wooden door frame, so no metal surface nearby either).
Ok. What bugs me a bit are the low RSSI values (-88...-100 dBm), which are probably not much above the background noise, and they gradually get worse. Maybe this is less of a controller placement issue than a mesh strength issue.
You can see that at least node 18 has a stronger connection than most others:
2022-03-13T09:11:21.169Z DRIVER « [REQ] [SendDataBridge]
callback id: 228
transmit status: OK, took 100 ms
repeater node IDs: 18, 10
routing attempts: 1
protocol & route speed: Z-Wave, 100 kbit/s
ACK RSSI: -89 dBm
^^ measured by the controller
ACK RSSI on repeaters: -63 dBm, N/A
^^ ^^^ measured by node 10
^^ measured by node 18
ACK channel no.: 0
TX channel no.: 0
In zwavejs2mqtt, on the network map, you have the ability to test the connection quality to individual nodes. I think this is your best bet now - identify nodes with a bad connectivity (might be the repeaters, not necessarily the "problematic" nodes) and try to optimize their connection.
You may also try to increase the TX power of the controller. At least partially your issue is caused by the nodes not being able to "hear" the controller. Note that this doesn't help if the controller can't "hear" some nodes though.
How would increase the controller's TX power? The health check on the network map is not working for me (it rapidly swamps the network, similar to https://github.com/zwave-js/node-zwave-js/issues/4130).
This is a relatively small appartment and the non-battery nodes are fairely evenly spaced. Node 18 is in the same direction as Node 10, but the latter is nearer to controller (basically they are located on opposite sites of the same wall: 1 -- wall -- "empty" room -- 10 -- wall -- 18 -- other room).
For comparison, this is the log from a few days ago using the Gen5+ controller (also Aeotec). Everything is the same except the controller (i.e. same nodes and placement): zwavejslog_20220309.zip
How would increase the controller's TX power?
This is a bit tricky, since no UI exposes it AFAIK. You can do it via code though:
I suggest the following:
await driver.controller.getPowerlevel()
await driver.controller.setPowerlevel(
3, // the normal powerlevel plus 3 (or so, need to test)
0 // the output power at 0 dBm, unchanged!
)
I would start with a slight adjustment of the normal powerlevel, e.g. +3 dBm. Note that increasing the powerlevel of a radio device may be against the law, so don't go overboard.
The health check on the network map is not working for me (it rapidly swamps the network
Any chance I could see a log of that?
I would start with a slight adjustment of the normal powerlevel, e.g. +3 dBm. Note that increasing the powerlevel of a radio device may be against the law, so don't go overboard.
What's normal here? GetPowerlevel
returned these values:
command: GetPowerlevel
normal powerlevel: 3.2 dBm
output power at 0 dBm: 14.8 dBm
The health check on the network map is not working for me (it rapidly swamps the network
Any chance I could see a log of that?
I've triggered the health check a few minutes ago: zwavejslog_20220314.zip
I've triggered the health check a few minutes ago
Ok that's https://github.com/zwave-js/node-zwave-js/issues/4075
What's normal here? GetPowerlevel returned these values
I'm trying to find that out. Unfortunately my test stick has very strange values there. Will let you know once I've found the defaults
What's normal here? GetPowerlevel returned these values
I'm trying to find that out. Unfortunately my test stick has very strange values there. Will let you know once I've found the defaults
Is the power level of a Gen5 stick comparable (or interesting)? Then I could switch back to my other stick.
Also, regarding network strength, do you see similar issues in the Gen5 log or is it the 700 series stick? (To me, things seem better with the Gen5 stick, but I'm not that well-versed in reading the logs.)
From what I've seen in the NVM data, the settings for 500 and 700 series sticks aren't comparable.
In your Gen5 log, I don't see RSSI for incoming messages, but the ACKs have RSSI values in the 50s/60s, roughly 30-40 dBm higher than on the 700 series. This could be caused by incorrect settings on the 700 series stick.
Start by setting the TX power to:
normal powerlevel: 0
measured: +3.3
which are the default settings. For good measure also set the RF region to EU, you never know...
await driver.controller.setRFRegion(0)
@mundschenk-at It might be that you're also affected by this bug: https://github.com/zwave-js/node-zwave-js/issues/4367 In that case you won't be able to change the settings on your own without editing the NVM backup first. I can do that for you if you send me the NVM backup via e-mail.
@mundschenk-at It might be that you're also affected by this bug: #4367 In that case you won't be able to change the settings on your own without editing the NVM backup first. I can do that for you if you send me the NVM backup via e-mail.
Yes it definitely looks that way:
2022-03-15T15:12:04.787Z DRIVER « [RES] [SerialAPISetup]
command: GetRFRegion
region: Unknown
I'll edit the backup myself, should not be a problem.
Sure, see here for details: https://github.com/zwave-js/node-zwave-js/issues/3906#issuecomment-1068031298
I tried this: Made a new 700 series backup, converted to JSON and replaced "rfConfig": null
with the object from https://github.com/zwave-js/node-zwave-js/issues/3906#issuecomment-1068031298. It worked, in that GetRFRegion
returns Europe
and the power levels are now 0.0 dBm
and 3.3 dBm
.
However, the whole network started to be reinterviewed immediately. As a result, several nodes had incomplete information and/or security classes. I manually re-interviewed them (using the "reset security classes" option) and was able to fix most of them (two are missing, one is a smoke detector that I would need a ladder to wake up, the other an Aeotec Water Sensor 6, which is quite buggy when using S0
security - I may have to reinclude that, maybe dropping security this time). Unfortunately, even after this, things are not really good.
I've tried starting a network heal, but that has stalled after just 4 nodes (i.e. not even all mains powered nodes have been healed).
Unfortunately, even after this, things are not really good. I've tried starting a network heal, but that has stalled after just 4 nodes .
Can you post another log so I can see for myself? You don't have to encrypt them, unless you re-include a secure device and/or change user codes while you're logging.
Unfortunately, even after this, things are not really good. I've tried starting a network heal, but that has stalled after just 4 nodes .
Can you post another log so I can see for myself? You don't have to encrypt them, unless you re-include a secure device and/or change user codes while you're logging.
Driver restarts as well (home ID), no? Since these are pretty long logs (whole day), I feel more comfortable with encrypted uploads.
The home ID is just a number that's transmitted in plaintext via radio anyways. If someone is standing near your network with a Zniffer, they will see it. Encrypted is fine for me though.
Here it is: zwavejslog_20220315.zip
(Sorry, a bit messy due to repeated re-interview/heal attempts.)
From what I'm seeing in the logs, the heal is slowly progressing. You had a full interview of node 99 going on in parallel, which delays the process quite a bit.
The RSSI values didn't change, but that's expected, since you didn't change anything physical. I guess now you can try this and see if the nodes can "hear" the controller better then.
The RSSI values didn't change, but that's expected, since you didn't change anything physical. I guess now you can try this and see if the nodes can "hear" the controller better then.
Just to be clear, do you mean I should initially try 3, 0
as the parameters (as in the example), or rather 3, 3.3
(3
added to the previous "normal powerlevel" of 0.0
, 0
added to the previous "output power at 0 dBm`)?
I meant 3, 3.3
.
Here's the new log. I think it is somewhat better, but node 10 still appears to be problematic (though I don't know why, it's a FGWP102, though you might consider it a legacy device - I didn't know how to read the specifications properly when ordering my first nodes).
Also, when doing a health check for node 11, node 80 started freaking out completely. This one is a bit remote (at the bottom of a cupboard at the other end of the flat), but there are plenty of repeaters in between if necessary.
Hm you're still running into the infinite healthcheck loop. Probably best if you wait for the next zwavejs2mqtt beta release, this will include a few fixes for this.
Not sure what's going on with Node 80 though. That is excessive and might warrant a bug report with the manufacturer. Such spamming can bring down the entire network.
Yes, but some health checks were successful (i.e. for nodes 18, 70, 100). Any improvements visible over the previous logs (signal strength-wise)? I have a hard time reading/comparing those.
Node 80 is an Aeotec smart plug set to report pretty frequently so that I know when a relative low-power device is on. I've never seen it spam the network this way, though. To me it appears to start with 2022-03-15T21:36:21.864Z CNTRLR « [Node 080] Beginning Transport Service RX session #14...
, but I don't see what's triggering it.
Yes, but some health checks were successful (i.e. for nodes 18, 70, 100). Any improvements visible over the previous logs (signal strength-wise)? I have a hard time reading/comparing those.
We're not going to see improvements in the signal strength. Making the stick "shout louder" doesn't make it "hear better". My hope was that the higher power leads to the acknowledgements reaching the nodes, so they stop re-transmitting the messages over and over.
The health checks are meant as a tool for you to identify nodes with problematic connectivity and potentially re-arrange a few nodes to improve on that. The next zwavejs2mqtt beta release will include fixes for the infinite loops and also show the actual routes used when you click on a node in the network map.
Yes, but some health checks were successful (i.e. for nodes 18, 70, 100). Any improvements visible over the previous logs (signal strength-wise)? I have a hard time reading/comparing those.
We're not going to see improvements in the signal strength. Making the stick "shout louder" doesn't make it "hear better". My hope was that the higher power leads to the acknowledgements reaching the nodes, so they stop re-transmitting the messages over and over.
Sorry, I worded that wrong. But did we see such an improvement? Otherwise, it appears that Z-Stick 7's antenna design (or something) is simply suboptimal compared to the Z-Stick Gen5+.
I don't think its much better. The routing reports almost always indicate the same nodes though where the route fails: 18 -> 69, 69 -> 80 and 18 -> 80. Might be worth looking into.
That's weird. 18 and 69 are in completely different rooms with lots of stuff in between (walls and kitchen appliances) and no way the best routes to take. Same for 69 and 80.
18 -> 80 is somewhat legit (same room), although "80" is the leaf node, location-wise. The arrow indicates the direction, i.e. 18 hops to 80 which should hop to the controller? Or is it the other way around?
Or is it the other way around?
It is. These reports indicate the routing from the controller to the nodes.
18 and 69 are in completely different rooms with lots of stuff in between (walls and kitchen appliances) and no way the best routes to take. Same for 69 and 80.
Try doing single node heals for those (and other potential nodes along the routes), step by step. Maybe this helps the controller figure out better routes.
I've tried healing individual nodes in between, but the routes have not become any better as far as I can see. Yesterday, I switched back to the 500 series stick, which seems to work much better overall. (I am helping Sensative debug the Strips Guard 700 firmware, and I don't want to change too many variables at the same time). I'll raise a ticket with Aeotec what's going on with their 700 series sticks.
Just so that I read the logs correctly, the ACK RSSI
in this example is what the controller receives from the repeater node (70
), and ACKK RSSI on repeaters
is what the repeater node received from the target node?
2022-03-18T01:37:12.129Z DRIVER « [REQ] [SendData]
callback id: 61
transmit status: OK, took 90 ms
repeater node IDs: 70
routing attempts: 1
protocol & route speed: Z-Wave, 40 kbit/s
ACK RSSI: -68 dBm
ACK RSSI on repeaters: -91 dBm
ACK channel no.: 1
TX channel no.: 1
That's almost correct. For the example you showed it is, but if there are multiple repeaters, you're going to have multiple measurements in the "on repeaters" line. E.g. if the route is (1 -> 70 -> 71 -> 68), then you'll see (for example)
repeater node IDs: 70, 71
ACK RSSI: -68 dBm
ACK RSSI on repeaters: -91 dBm, -88 dBm
where -88 is the RSSI of the ACK that 71 received from 68 and -91 is the RSSI of the ACK that 70 received from 71 and -68 is the RSSI of the ACK that the controller received from 70
I'll raise a ticket with Aeotec what's going on with their 700 series sticks.
Let me know what they say.
Small follow-up question: What's a ”reasonable” RSSI? -91 dBm
is pretty low, too low? (As in: one should do something about it, like adding a repeater at another location.)
AFAIK this depends on the background RSSI. The higher that is, the higher the RSSI should be. The lifeline health check is modeled after what Silabs' PC Controller does and they recommend an RSSI margin (<RSSI> - <background RSSI>
) of 17 dBm or more.
If you're interested in the background RSSI at the controller (it can't be measured at nodes), you can execute a driver function and watch the logs:
await driver.controller.getBackgroundRSSI()
Is there a driver command to dump the routing tables? Aeotec asked if I could provide those.
The current state is included in the NVM backup. Make a fresh one, then convert it to json and copy all entries for lwr
and nlwr
.
This issue has not seen any recent activity and was marked as "stale 💤". Closing for housekeeping purposes... 🧹
Feel free to reopen if the issue persists.
I am still waiting on meaningful response from Aeotec. Their first level support has made some suggestions for a test setup, but it's not practical investment wise and I don't think they have looked at the provided logfiles at all.
@AlCalzone At Aeotec's advice I've created a test setup with just the 700 controller and a single node. Today I've run health checks with the node in increasing distance from the controller (in 1 m intervals). When I couldn't do straight line-of-sight anymore due to space constraints, I put the node about a further 2 m away, with a thin wall and some furniture (speakers) in between. (That's the measurement from 2022-04-23T09:19:45.952Z
). Then I made an NVM backup to have a look at some of the internal settings after the factory reset.
Some time later I did a health check again without moving either the controller or the node. The results were pretty weird, as GetRoutingInfo
now returned loads of neighbors (in a network that never had more than two nodes, and with node numbers not corresponding to any nodes of the ”real” network running on the gen5+ stick either). Is this the controller malfunctioning completely? Some weird interaction because of the backup?
zwavejs_2022-04-23_weird_routing.log
I'm running the test environment on an RPi4 with docker (node-zwave-js 9.0.4
).
The weird neighbors look like random data from the controller memory IMO. This information is stored as a bitmask and maybe the controller is reading the data from a weird location with uninitialized/random memory.
I'm not sure why a backup would cause this though, since it is just reading.
Maybe it was not the backup but an incomplete reset (since GetRFRegion returned unknown
)?
Could be... if you convert the JSON I sent you back to a binary NVM file and restore that, does GetRFRegion
return something useful again?
Is your problem within Home Assistant (Core or Z-Wave JS Integration)?
NO, my problem is NOT within Home Assistant or the ZWave JS integration
Is your problem within ZWaveJS2MQTT?
NO, my problem is NOT within ZWaveJS2MQTT
Checklist
[X] I have checked the troubleshooting section and my problem is not described there.
[X] I have read the changelog and my problem was not mentioned there.
Describe the bug
Yesterday, after updating the stick to FW 7.17.2.0, I restored a current NVM backup from my 500 series stick. The driver started up fine and the nodes seem to work OK. Unlike previous attempts, there were no dead nodes and no automatic re-interviews.
However, when checking the log files this morning, I noticed that it does contain a lot of "Duplicate command" errors, and not (just) from the Sensative Strips Guard 700. Could this be a side effect of a "Health Check" attempt I made (under the erroneous assumption that certain issues had been fixed since the last time I tried)? I did restart the driver afterwards though, to stop the network flooding.
Device information
Manufacturer: Aeotec Model name: Z-Stick 700 Node ID in your network: 1
How are you using
node-zwave-js
?zwavejs2mqtt
Docker image (latest)zwavejs2mqtt
Docker image (dev)zwavejs2mqtt
Docker manually built (please specify branches)ioBroker.zwave2
adapter (please specify version)HomeAssistant zwave_js
integration (please specify version)pkg
node-red-contrib-zwave-js
(please specify version, double click node to find out)Which branches or versions?
version:
node-zwave-js
branch: 8.11.6zwavejs2mqtt
branch: 6.5.2zwavejs2mqtt community addon
: 0.36.0Did you change anything?
yes (please describe)
If yes, what did you change?
Switched back to Z-Stick 700
Did this work before?
Yes (please describe)
If yes, where did it work?
With Z-Stick 5gen+
Attach Driver Logfile
zwavejslog_20220312.zip