n8n-io / n8n

Free and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
https://n8n.io
Other
47.37k stars 7.1k forks source link

Ssh node crashes the node process with an unhandled exception #2907

Open n1xx1 opened 2 years ago

n1xx1 commented 2 years ago

Describe the bug A disconnect error inside the ssh2 library isn't handled by node-ssh and n8n.

To Reproduce Not exactly the easiest thing to reproduce since it depends of the actual ssh implementation you use on the server. I haven't had any problems with linux servers, but I have issues with windows ones.

Expected behavior If a ssh error happens outside of the ssh flow, it shouldn't crash the node process with an unhandled exception.

Environment (please complete the following information):

Additional context I did open an issue here: https://github.com/steelbrain/node-ssh/issues/421 since looking at node-ssh it doesn't export a way to add an error handler.

Joffcom commented 2 years ago

Hey @n1xx1,

Does it crash N8N or does the workflow just error out?

n1xx1 commented 2 years ago

Hello @Joffcom, sorry I wasn't clear enough. It's the workflow process crashing. When I open the execution in the web ui I get "Workflow execution process did crash for an unknown reason!" and no execution data. I did some testing and it looks like it happens when the SSH nodes aren't the last nodes to be executed in the worflow.

Joffcom commented 2 years ago

Thanks for clearing that up @n1xx1, Looking at the node we follow our normal try / catch on the execution and it looks like this is failing and knocking out the workflow process before it can even be caught which isn't good.

It looks like we may have to wait for the node-ssh project to resolve the issue though, I will double check with our core team to see if they have any thoughts on this.

Joffcom commented 2 years ago

Hey @n1xx1,

Had a chat with one of the core chaps here and it looks like as we do try and handle the errors and the process under us it will be a case of waiting for a change to node-ssh or if we can find a way to reproduce the issue reliably we can dig deeper into it.

n1xx1 commented 2 years ago

Something that could be done in n8n is to handle https://nodejs.org/api/process.html#event-uncaughtexception Maybe if the err.code is "ECONNRESET" and if in the err.stack there is something related to ssh2, otherwise kill the process as normal.

I know it's really ugly as a solution, but maybe you can register it only if you use a Ssh node, and not globally. Once the node-ssh issue is resolved it can be removed safely.

Joffcom commented 2 years ago

My worry there would be that not only is it ugly and not a recommended solution there is likely to be a good chunk of work that needs to go into implementing and testing it properly, That and we would have to run this through our prioritization process by that time the node-ssh library may have a fix in place.

If you happen to have a simple workflow that always hits this error and the SSH Server implementation that is running on the Windows box we can start to dig into it a bit more.

kalwinskidawid commented 2 years ago

Hey, any progress on resolving the problem?

n1xx1 commented 2 years ago

Hi @kalwinskidawid, the thing I've done in my specific workflow is that I run this javascript function in the first step.

const process = require("process");
process.on('uncaughtException', err => { console.error(err, 'Uncaught Exception thrown') });
return items;

Note that it requires these quite unsafe env variables to be set: NODE_FUNCTION_ALLOW_BUILTIN=* and NODE_FUNCTION_ALLOW_EXTERNAL=*. And it will only work if you are running in EXECUTIONS_PROCESS=own mode (default).

Joffcom commented 2 years ago

The good news here is I have been looking into this a bit more recently and I have some ideas I want to implement to see if they help.

gabeyww commented 1 year ago

I have this same issue it looks like. I'm running a workflow with multiple SSH calls to a Windows SSH server. It executes the first one fine, but immediately crashes when it gets to the subsequent one.

Joffcom commented 1 year ago

@gabeyww that is handy to know it is still an issue, We did some initial work on looking into it and it looked like the underlying issue is a bit of a problem with the package we use but our internal ticket is still open for this one which we are tracking as n8n-4629