node-red / node-red-nodes

Extra nodes for Node-RED
Other
995 stars 595 forks source link

[Daemon node] write can throw and is not caught, crashes NR #960

Open tve opened 2 years ago

tve commented 2 years ago

Which node are you reporting an issue on?

Daemon

What are the steps to reproduce?

Unknown. Happens sporadically due to unreliable wifi links.

What happens?

Node-RED crashes

15 Nov 10:45:00 - [warn] [daemon:ssh] Restarting : ssh                                             
15 Nov 10:45:01 - [red] Uncaught Exception:                                                        
15 Nov 10:45:01 - [error] Error: write EPIPE                                                       
    at afterWriteDispatched (node:internal/stream_base_commons:160:15)                             
    at writeGeneric (node:internal/stream_base_commons:151:3)                                      
    at Socket._writeGeneric (node:net:874:11)                                                      
    at Socket._write (node:net:886:8)                                                              
    at writeOrBuffer (node:internal/streams/writable:392:12)                                       
    at _write (node:internal/streams/writable:333:10)                                              
    at Writable.write (node:internal/streams/writable:337:10)                                      
    at DaemonNode.inputlistener [as _inputCallback] (/data/node_modules/node-red-node-daemon/daemon
.js:44:81)                                                                                         
    at /usr/src/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:210:26                   
    at Object.trigger (/usr/src/node-red/node_modules/@node-red/util/lib/hooks.js:166:13)          

As of a recent commit Line 44 in daemon.js is now: https://github.com/node-red/node-red-nodes/blob/master/utility/daemon/daemon.js#L53

What do you expect to happen?

I expect the daemon node to catch the exception and handle it gracefully without taking all of NR down.

Please tell us about your environment:

15 Nov 10:45:16 - [info] Node-RED version: v3.0.2                                                  
15 Nov 10:45:16 - [info] Node.js  version: v18.7.0                                                 
15 Nov 10:45:16 - [info] Linux 5.15.0-46-generic x64 LE                                            
hardillb commented 2 years ago

Just to attempt to make this clearer.

If I've understood correctly you are using the daemon node to run an ssh session to a remote host and the wifi outage is causing this to exit.

It looks like the node is trying to write to the now none existent stdin for the dead ssh process.

dceejay commented 1 year ago

@tve Could you try adding a try catch around the write in that if statement line ? ( as you are best placed to recreate the error. )

dceejay commented 1 year ago

having looked at the code I must say I'm slightly baffled - as soon as we spawn the command we set up an error handler (Line 117) - so for some reason that isn't getting called... more eyes on please.

edit - Aha - found this https://stackoverflow.com/questions/67296866/node-js-an-uncatchable-error-is-thrown-when-the-child-process-is-abruptly-close

so I'll add an error handler specifically to the stdin...

dceejay commented 1 year ago

@tve - published version 0.5.1 for you to try

tve commented 1 year ago

Thanks, will try it out!

dceejay commented 1 year ago

@tve - any feedback ? OK to close ?

tve commented 1 year ago

Sorry, 'been slow... I just rebuilt the NR container and relaunched. It was crashing at least once a day, so should know soon!

tve commented 1 year ago

No issue so far, I'll close the ticket, can always reopen if necessary. Thanks for the fabulous turn-around time!!

tve commented 1 year ago

Hmmm, I don't know whether this is truly related, but it looks suspiciously so. Same system crashed NR on an EPIPE, apparently inside node.js itself:

3 Dec 09:25:07 - [red] Uncaught Exception:                              
3 Dec 09:25:07 - [error] Error: write EPIPE                             
    at WriteWrap.onWriteComplete [as oncomplete] (node:internal/stream_base_commons:94:16)                                                      
***** NODE-RED STARTING ***** Sat Dec 3 09:25:16 PST 2022               

The previous log message was unrelated and 30 seconds prior. The immediate NR restart is by docker. This happened ~5 hours prior as well (no additional info in the log), but didn't happen in the previous 4 days. Not sure how to troubleshoot this...

dceejay commented 1 year ago

if it throws an error asynchronously then the recommended thing to do is restart whch is what we do. As this is node-internal I'm not sure what we can do about it.

tve commented 1 year ago

Do you have suggestions for how to troubleshoot this? NR crashed 8 times yesterday due to this and 3 times today. I have to find some solution... I'm running node 18.7.0, upgrading to 18.12.1 now...

dceejay commented 1 year ago

Generally an EPIPE error means that the other end closed the connection unexpectedly... so I would look there.