Open ColColonCleaner opened 1 year ago
Forgot to upload the ZUI log. Here it is: z-ui_2023-01-24.log
Think I found the reason. Exceptions while calling a node are not caught, and they also aren't displayed on the UI when the 'network graph' page is open. Setting this up with configs I expected to fail part-way through, i switched back to the main dashboard and that's where the exception was displayed when the health check broke. Seems I'm having some decode issues. But since exceptions (just this specific one perhaps) aren't caught or displayed on the page I didn't realize it was due to an error until now.
Is it reasonable to expect that the health check would mark the ping as failed and perhaps list the unique reasons they failed at the end instead of silently cancelling the health check?
Apart from the UI issue not showing errors, @AlCalzone any clues?
Please make a driver log, loglevel debug
and attach it here as a file.
This should give us a better idea why it is failing.
I ran health checks with varying quantities of runs and varying nodes. Logs attached. z-ui_2023-01-25.log zwavejs_2023-01-25.log
Ok I think I've identified a potential issue. Not sure if my planned fix is correct, because the command sequence doesn't really add up. https://github.com/zwave-js/node-zwave-js/issues/5366
Based on what you described in that issue this may be a linchpin for the stability of my network because I see dropped commands/messages a lot, along with the decode failure message. Or is the decode failure a separate problem entirely?
Not sure if that's your actual issue, but the sequence here is as follows:
that second thing is what I find weird. In any case we should probably just ignore it in that situation, because it causes the outgoing command to be re-transmitted immediately, leading to another "I didn't understand that".
Interesting. If you have a way of pushing out a beta version with that change I'd be happy to try it and send logs.
Checklist
Deploy method
Docker
Z-Wave JS UI version
8.7.0
ZwaveJS version
10.3.1
Describe the bug
The network graph's 'node health check' feature fails to complete its runs a good portion of the time when a run count other than 5 or 1 are used. For run counts of 5 and 1 it works most of the time and completes all runs, but for other values it cancels part way through most of the time and I'm not sure why. For values of 9 or higher it almost never starts. I got it to work once with a run count of 10, out of many attempts.
To Reproduce
Run node health checks with run counts above 10 for the quickest reproduction, but any values other than 5 or 1 I've noticed have a large chance of failing to complete all runs or sometimes any runs at all.
Expected behavior
Expected the health checks to complete for the number of runs entered.
Additional context
Here is a video showcasing the issue: https://www.youtube.com/watch?v=PGocFe9YRCE