mikakaraila / node-red-contrib-opcua

A Node-RED node to communicate OPC UA. Uses node-opcua library.
Other
216 stars 197 forks source link

OPC UA node not re-establishing connection after error #576

Open ErosBaixauli opened 1 year ago

ErosBaixauli commented 1 year ago

Hello there,

We have a node red development to operate and monitorize different machine productions via OPC UA. image

Recently, after a issue on the electricity and connections on site, we realize that the OPC connection nodes were stuck with the following error:

image

A fast flow reset solved the issue and everything started working again, but we need this to be done automatically. This machines can be shutdown from time to time, so I don't want to try to restart the node red docker when the PLC doesn't answer the requests.

So, I have some questions:

Thanks in advantage for the help!

mikakaraila commented 1 year ago

1) There is example in the examples folder how to inject disconnect / reconnect in the file OPCUA-TEST-NODES.json 2) Catch would be optimal, I cannot remember if client node will report error in a way it can be catched => TODO/FIX 3) Look first one

ErosBaixauli commented 1 year ago

Thanks for the fast reply.

Yes, after I posted the issue I found the examples and learned about msg.action="reconnect".

I just put on test a code to simulate the costumer issue and the reconnection option, on the connection is down for a long time and then de PLC recovers. For a while, the node itself tries to connect, throwing this error:

image

Meanwhile, I was sending a reconnect attempt by message every 30s, simulating an automatic system trying to recover.

It worked fine and the nodes were connected succesfully when we reconnect the PLC, but they appear like this

image

It stays in a loop, changing from "connected re-established" to "reconnecting..." and back to "connected re-established".

I know that ask in a loop to connect isn't a good solution, so I'll try to solve it and reach out later with the solution. If you have any tip will be appreciated.

Thanks!

mikakaraila commented 1 year ago

Has @erossignon commented this one?

RedShift1 commented 1 year ago

I just came across this issue myself, this behaviour appears to have changed between versions. In older versions it would reconnect by itself.

RedShift1 commented 1 year ago

I'm in the same boat where the equipment gets turned off at night. I created a separate node with "Action" set to "RE-CONNECT" with the same endpoint I'm using to read the data from:

image

And then connected it to a cron job node which emits an empty message every 5 seconds:

image

Not sure if it's the right way to do it but it has worked 2 days in a row now.

mikakaraila commented 1 year ago

NOTE: reconnect action should be used only when you first disconnect your client. It can break normal node-opcua reconnect build in functionality. Root cause for the connection break should be investigated. It is not normal that communication breaks. There must be something in the environment (server or client).

ErosBaixauli commented 1 year ago

Hello everyone,

Thanks for the implication on the issue, it's nice to see that the creators are still improving their baby haha

I manage to fix the issue having a connection control flow: when the system stops getting data from the production connection nodes, a time out line disconnects all of them and triggers a separated connection node (which receives a "disconnect" starting order) that starts trying to connect periodically (like every 15/30 min). When that node connects, all the production nodes receive the order to connect and resume operations. If that reconnection works, the connection control node disconnects until next time that it's needed.

In some cases, I found the production nodes with a dead connection issue and cannot reconnect, no matter how, until you reset the flow. I manage that case as a false positive, if after 3 attempts the connection isn't recovering but the control connection node is correctly connected, the flow executes a command to reset its own docker. As the flow doesn't depends on local variables to operate, that works fine so far.

Hope that this idea helps anyone strugling with this issue. I'll be glad if any of you have any suggestions or opinions to improve it.

If someone wants the flow json, I can prepare a sample of the connection flow. I can't share the whole json, as it has sensible data from the costumer.

RedShift1 commented 1 year ago

In some cases, I found the production nodes with a dead connection issue and cannot reconnect, no matter how, until you reset the flow. I manage that case as a false positive, if after 3 attempts the connection isn't recovering but the control connection node is correctly connected, the flow executes a command to reset its own docker. As the flow doesn't depends on local variables to operate, that works fine so far.

That seems like a very sledgehammer approach... Which we can't use because we have numerous other flows happening which would be interrupted by having to restart node-red...

mikakaraila commented 1 year ago

FYI: I have been refactoring code to TypeScript based and Etienne is building client2 node that will use just one client & session. Work is in progress...

RedShift1 commented 1 year ago

NOTE: reconnect action should be used only when you first disconnect your client. It can break normal node-opcua reconnect build in functionality. Root cause for the connection break should be investigated. It is not normal that communication breaks. There must be something in the environment (server or client).

On the client side, this issue started occuring after updating to version 0.2.310 (sorry I did not keep track of the version that was installed before). On the server side (the PLC we're reading OPC-UA data from), no changes were made.

The problem here is that the disconnect is "abnormal" to begin with, but we don't know in advance when this disconnect is. The machine can get turned off in the evening, or during the day for maintenance, etc... So you can't schedule a reconnect action in advance, it has to be able to reconnect on its own after failures so it remains reliable.

fanbel commented 1 year ago

Hope i can help with some insight here we gathered over the last few months of intensive opcua testing (different machines with daily downtimes, e.g. after production):

Hope I described everything understandable (more or less). Feel free to ask questions, if anything is unclear.

RedShift1 commented 1 year ago

The other bug report with this issue was closed but the problem's not fixed yet... I downgraded to 0.2.292 and it still fails to reconnect after any kind of failure... I'm sure this worked perfectly in the past but now it doesn't anymore even when downgrading...

RedShift1 commented 1 year ago

Ok I went full sledgehammer on this one... Started a separate Node-RED instance and used the Node-RED HTTP API to reload the flows once it detects it wasn't able to read data for 1 minute. Using the HTTP node you can reload the flows like this:

image

Change URL to your Node-RED's address (mine's running in Docker so localhost:1880 works). If your Node-RED uses authentication you'll need to add another HTTP header, see the HTTP API docs.