Open tkurki opened 1 year ago
Did some investigating. The offending program is mpstat which can take up to 5s to return values. In the process the call somehow blocks canboatjs long enough for it to time out (and fail silently). FWIW, recreated the failure with a test plugin that called spawn that did nothing more than sleep for 5 seconds.
I'm not a Nodejs expert but per the Node docs Node::spawn() calls will not block the event loop so not sure how or why the hiccup is occurring.
For the most part rpi-monitor (at 60s update intervals) won't cause issues with CAN0 but cannot guarantee that won't be true over longer run times. I recall losing my CAN0 sensors at sea once. At the time I chalked it up to WiFi failure but now I'm suspecting it was mpstat running inside of rpi-monitor that did it.
Further investigation of canboatjs is warranted to explain why a non-blocking system call is causing CAN0 to fail. And I'd suggest that future plugins and base SK mods be wary of using spawn() for anything I/O intensive until we've found an answer.
I substituted a simple sleep call for mpstat in a modded rpi-monitor plugin and CAN0 went down after about 20 minutes when refresh interval was set to < 5.
This suggests that the problem is at the Nodejs level or more likely, somewhere within canboatjs.
So it appears that this is not a bug in signalk-rpi-monitor, but something in the node environment?
Is there a better way to get the CPU utilization data than spawning mpstat that might not have this problem?
If not, then is it okay to close this issue?
To the best of my knowledge, the bug is somewhere deeper in the sk/nodejs environment but for now any program calling spawn() frequently enough will bring down canboatjs CAN0 connections (at least those mediated by a MCP2515 controller, currently testing others).
@tturki opened the issue but I'd say with confidence it isn't a signalk-rpi-monitor issue and you should feel free to close it.
For now, please refer anyone who reports the problem to a solution outlined here.
https://github.com/SignalK/signalk-server/issues/1626