mikakaraila / node-red-contrib-opcua

A Node-RED node to communicate OPC UA. Uses node-opcua library.
Other
209 stars 194 forks source link

Error: "Invalid Channel BadConnectionClosed" #371

Closed OriolFM closed 2 years ago

OriolFM commented 2 years ago

I have been developing a system to read information from some production lines using OPC-UA.

After a bit of a rough start, my system currently includes 4 lines that are working without incidents.

I'm trying to include two more lines from the same vendor in the system, they use the same type of configuration and some of the modules in the machines are identical:

image

However, one of the lines connects in the beginning but later it times out, giving some errors. At the same time, UAExpert does not report any network issues that would justify the connection problems, and just keeps showing the normal parameters.

With the last line is even worse: it does not connect, it reports that the connection attempt has been rejected by server/timed out. However, UAExpert can access and display the values I need to read without any issues, and the configuration data is exactly the same.

image

In addition to that, I do have a catch error node in the same tab, but it does not seem to work with the aforementioned errors.

Any ideas on what would cause that?

mikakaraila commented 2 years ago

Add Debug node before and after client node and run node-red with verbose option. Just to see a bit more what is happening in your flow.

OriolFM commented 2 years ago

I modified the flow like this: image

I attached a zip file with the debug logs from node-red. Logs.zip

mikakaraila commented 2 years ago

One question lower part of the flow Add items is not connected as upper part?

OriolFM commented 2 years ago

It's not because the client node can't connect, and trying to add items will only throw a lot of additional errors.

I disconnected it to reduce the number of errors and make debugging easier.

mikakaraila commented 2 years ago

There are also some other nodes crashing.

Try to reduce first flow to get it working part by part.

ConnectionError: Connection is closed. at Request._query (/root/.node-red/node_modules/mssql/lib/base/request.js:495:37) at Request._query (/root/.node-red/node_modules/mssql/lib/tedious/request.js:367:11) at /root/.node-red/node_modules/mssql/lib/base/request.js:459:12 at new Promise () at Request.query (/root/.node-red/node_modules/mssql/lib/base/request.js:458:12) at connection.node.execSql (/root/.node-red/node_modules/node-red-contrib-mssql-plus/src/mssql.js:423:40) at doSQL (/root/.node-red/node_modules/node-red-contrib-mssql-plus/src/mssql.js:779:25) at /root/.node-red/node_modules/node-red-contrib-mssql-plus/src/mssql.js:765:17 at runMicrotasks () at runNextTicks (internal/process/task_queues.js:60:5) at processImmediate (internal/timers.js:437:9) { code: 'ECONNCLOSED' }

OriolFM commented 2 years ago

I know the mssql node throws errors sometimes, but the reload rate is quite low, and it is not a critical function (it just reads material numbers from an automatic storage machine).

In any case, it is not a critical function for the flows and it is completely detached from the OPC-UA ones.

mikakaraila commented 2 years ago

I just wonder if there is something blocking communication because it is getting timeouts...

mikakaraila commented 2 years ago

There are something with Telegram node? Mon, 10 Jan 2022 10:40:29 GMT node-telegram-bot-api deprecated Automatic enabling of cancellation of promises is deprecated. In the future, you will have to enable it yourself. See https://github.com/yagop/node-telegram-bot-api/issues/319. at internal/modules/cjs/loader.js:1085:14 Ignoring invalid timezone passed to Connection: GMT. This is currently a warning, but in future versions of MySQL2, an error will be thrown if you pass an invalid configuration option to a Connection

Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Transaction has timed out ( timeout = 60000 ms , request = ReadRequest)"] Schmid Multibond 2: Case A: Endpoint does not contain, 1==None 2==Sign 3==Sign&Encrypt securityMode:[3] securityPolicy:["http://opcfoundation.org/UA/SecurityPolicy#Basic256"] Schmid Multibond 2: Case B: UserName & password does not match to server (needed by Sign): undefined undefined

mikakaraila commented 2 years ago

Are you using username & password? I mean Sign & Encrypt. Those seems to be undefined?

OriolFM commented 2 years ago

Are you using username & password? I mean Sign & Encrypt. Those seems to be undefined?

I use self-signed certificate for the connection, the same as I do in UAExpert. I connect to 4 other clients by the same vendor with the same config and I don't have any problems with them. image

OriolFM commented 2 years ago

There are something with Telegram node? Mon, 10 Jan 2022 10:40:29 GMT node-telegram-bot-api deprecated Automatic enabling of cancellation of promises is deprecated. In the future, you will have to enable it yourself. See yagop/node-telegram-bot-api#319. at internal/modules/cjs/loader.js:1085:14 Ignoring invalid timezone passed to Connection: GMT. This is currently a warning, but in future versions of MySQL2, an error will be thrown if you pass an invalid configuration option to a Connection

I am using Telegram nodes to send push notifications to my cell phone, I'll look into it but it is currently working as intended, so it is kind of a low priority.

OriolFM commented 2 years ago

I just wonder if there is something blocking communication because it is getting timeouts...

There shouldn't be. All machines are in the same network and subnet except for the node-red server, but there is already a firewall exception for that subnet. I've asked IT to double check those specific IPs to make sure, but if the machines were blocked they would not connect at all, and at least one of them connects sometimes and reads values correctly, just sometimes loses the connection and throws the error.

And the other one does not work at all.

mikakaraila commented 2 years ago

OK, what about node-red console? Anything there? Are you running it with "node-red -v flow.json"

Ooption -v will activate verbose mode.

mikakaraila commented 2 years ago

It could be something with the certificates.

I found these: https://github.com/open62541/open62541/issues/2562 https://github.com/node-opcua/node-opcua/issues/960

If you have Siemens OPC UA Server that you are connecting...

OriolFM commented 2 years ago

It could be something with the certificates.

I found these: open62541/open62541#2562 node-opcua/node-opcua#960

If you have Siemens OPC UA Server that you are connecting...

Seems that my case is similar to those, since I can connect without problems from UAExpert but get problems when connecting from the OPC-UA client node.

What puzzles me is that all the OPC-UA clients I connect are from the same vendor, using the same hardware (S7-300 PLCs), and all of them were installed at the same time, so it stands to reason that all of them run also the same firmware (can't say for sure because I only have external access, not direct access to the PLC).

OriolFM commented 2 years ago

OK, what about node-red console? Anything there? Are you running it with "node-red -v flow.json"

Ooption -v will activate verbose mode.

I'm running node-red with the PM2 manager, I activated the "debug" option in settings so the output includes more details.

PM2 automatically generates logs for console output and errors in separate files (like the ones I sent), and compresses it after a while. image

mikakaraila commented 2 years ago

Check this folder: /root/.config/node-red-opcua-nodejs/PKI/trusted/certs

It should contain certificate files. In the error log it reports these: SchmidOpcUaServer[16fbb0103614dd5602407a4f91e152619308eba8].pem SchmidOpcUaServer[b3834cb5456d11b03627aca50ebbf79c3e64cb9b].pem SchmidOpcUaServer[180fee7d0739e3e46e81970f2edc3667f37632fb].pem

I wonder why it has so many certificates from the server... with different finger print (hex value).

Can you compare UaExpert trusted certificates?

And check also rejected folder!

OriolFM commented 2 years ago

My flows connect to 8 different OPC-UA servers, and there are 8 certificates in the folder, so I think that is correct. I stopped node-red and deleted them, to force the clients to generate new certificates. They do it, but I keep having the same problem with these two clients (the other 6 work fine without any error, with the same exact configuration).

The UAExpert certificates are in a different format (.der), I'll try converting them and replacing the ones in the folder.

Rejected folder is empty.

mikakaraila commented 2 years ago

OK then it looks correct. Have you checked those two servers with UaExpert server diagnostics tabs? They have enough resources and no error counters increasing?

mikakaraila commented 2 years ago

New idea, if S7-300 is overloaded with read multiple, add one client node that will first register each nodeId. Client node has action "REGISTER" it will inform server to prepare that the nodeId will be read again after some interval. image

OriolFM commented 2 years ago

OK then it looks correct. Have you checked those two servers with UaExpert server diagnostics tabs? They have enough resources and no error counters increasing?

The one I can't connect to from node-red only shows 2 sessions (the one from the machine's HMI and the one from my laptop), and shows no errors otherwise. image

The one I can connect but that gives me errors from time to time shows three sessions (one from the machine's HMI, one from my laptop, and one from node-red) but also shows no connection errors... which is puzzling. image

Either way, it doesn't look like it's struggling.

OriolFM commented 2 years ago

New idea, if S7-300 is overloaded with read multiple, add one client node that will first register each nodeId. Client node has action "REGISTER" it will inform server to prepare that the nodeId will be read again after some interval. image

You mean use a client node just to register the node IDs, and another one for the read multiple?

So, on starting the flows, I would clear the list first and then register all the node IDs, same as I do now but on the REGISTER client node, and then use the node I currently have to read, both on the same endpoint, if I understood correctly?

OriolFM commented 2 years ago

Can you compare UaExpert trusted certificates?

I made a test, copying the UAExpert certificates to the trusted/certs folder (after changing the file type to .pem) and I deleted all the other certificates.

Upon restarting node-red, all the missing certificates are generated anew except for the two which are present. The error behaviour keeps being the same (Multibond 1 client having errors from time to time, Multibond 2 not being able to connect despite UAExpert being able to do so with the same certificate).

So I don't think it's the certificates, either.

mikakaraila commented 2 years ago

First items (nodeIds) are registered with one client node and then same items are read with readmultiple client node. S7-300 should work faster if nodeId is first registered (need to do once for each nodeId) as flow is initialized. Can you check CPU load on S7 or available memory?

OriolFM commented 2 years ago

First items (nodeIds) are registered with one client node and then same items are read with readmultiple client node. S7-300 should work faster if nodeId is first registered (need to do once for each nodeId) as flow is initialized. Can you check CPU load on S7 or available memory?

I'm afraid I do not have direct access to the PLC (that is the whole point of OPC-UA, or I would be using the node-S7 instead to read the data from a memory block, as I do for other machines).

mikakaraila commented 2 years ago

Register nodes is part of the OPC UA specification: https://reference.opcfoundation.org/v104/Core/docs/Part4/5.8.5/

I will make you a small example flow, just a moment.

mikakaraila commented 2 years ago

Here (updated example):

image

REGISTER.txt

mikakaraila commented 2 years ago

Interesting comment on node-red forum (your discussion thread): https://discourse.nodered.org/t/reading-multiple-items-with-node-red-contrib-opcua/54259/6

EDIT: Could figure it out. Needed to raise Intervaltime for OPC-Communication on S7 side.

OriolFM commented 2 years ago

Interesting comment on node-red forum (your discussion thread): https://discourse.nodered.org/t/reading-multiple-items-with-node-red-contrib-opcua/54259/6

EDIT: Could figure it out. Needed to raise Intervaltime for OPC-Communication on S7 side.

Interesting. I will try to contact the supplier about it, although it may prove difficult to fix, considering that the warranty period is over and no one bothered to look at these functions earlier.

This could be the answer to the timeouts in one of the clients, but the one that can't be connected to might prove more difficult. I'll take a look and report back in case this works, so others with similar problems can find the solution.

mikakaraila commented 2 years ago

I expect you just have to give more execution time for the OPC UA communication.

Siemens manual: https://cache.industry.siemens.com/dl/files/400/45351400/att_963196/v1/HelpFile_OPC_32_en.pdf

mikakaraila commented 2 years ago

As same setup works with other PLCs, it must be something with the specific one. How many items you are reading? Have you tested with one or two items?

OriolFM commented 2 years ago

I managed to talk with the supplier yesterday, it appears they do not use the PLC to serve as OPC-UA server, but a PC running windows embedded and UAExpert Server, which is the same one that runs the HMI dashboard for the machine.

I checked the firewall settings on the PC, I thought maybe it was blocking some of the ports and that was why I could not connect to one of the lines, but the firewall is disabled.

Because the machine is a production line with many modules, I often add the modules one by one and make sure they are running fine and that I receive the data. I start with module 1, which basically has only three NodeIDs that I use (an integer for machine status, a boolean that confirms if the machine is running or not, and an int32 for the conveyor speed).

The timeout errors have been happening since that point, but because they do not happen all the time, I can still receive some of the information (but you can see the difference between the total request count and the read count since yesterday) image

By the way, I already cleared the log errors with the mssql nodes that you mentioned, I just needed to add a short delay between the point where I start the flows and I begin quering the DBs... the connection was not open yet and the first few queries were failing... as I guessed, nothing related to the current problem.

mikakaraila commented 2 years ago

I expect PLC is perhaps under some other real-time load as you say: "The timeout errors have been happening since that point, but because they do not happen all the time, I can still receive some of the information (but you can see the difference between the total request count and the read count since yesterday)"

It could help if vendor changes OPC UA server settings, but of course it will need most probably short production break.

OriolFM commented 2 years ago

I've checked with the supplier, the PLCs are fine, and I've tested with UAExpert from other computers in the network, and found no error.

I applied the last update on the opc-ua node via the manage pallette, and now one of the lines that was working can't connect anymore: image

I don't know what this is, but it is definitely not the network, the PLC, or the PC where the OPC-UA server is running.

I am quite clueless on how to proceed from here, these seem like arbitrary errors unrelated to the config (one line was working, I applied the update, and now it's not).

mikakaraila commented 2 years ago

Set these environment variables before you start node-red: set NODEOPCUADEBUG="CLIENT{TRACE-REQUEST-RESPONSE}" set NODEOPCUADEBUG="SERVER{TRACE-REQUEST-RESPONSE}" set ARRAY_LENGTH=2000 set NODEOPCUA_DEBUG_MAXLINE_PER_MESSAGE=200

Then we will see what is happening at the protocol level. Console output will be full of debug messages.

OriolFM commented 2 years ago

set NODEOPCUADEBUG="CLIENT{TRACE-REQUEST-RESPONSE}" set NODEOPCUADEBUG="SERVER{TRACE-REQUEST-RESPONSE}" set ARRAY_LENGTH=2000 set NODEOPCUA_DEBUG_MAXLINE_PER_MESSAGE=200

Where do I set those? I tried adding these lines to settings.js but they don't work:

process.env.NODEOPCUADEBUG = "CLIENT{TRACE-REQUEST-RESPONSE}";
process.env.NODEOPCUADEBUG = "SERVER{TRACE-REQUEST-RESPONSE}";
process.env.ARRAY_LENGTH = 2000;
process.env.NODEOPCUA_DEBUG_MAXLINE_PER_MESSAGE = 200;

EDIT: Nevermind, I think I got it. It was about the location of those lines in the settings.js file.

OriolFM commented 2 years ago

Here are the logs. I stopped node-red, deleted all the old logs, and started clean in verbose mode.

I am running node-red with the pm2 daemon, it stores the log output and error output separately in two files, which I have zipped together. There are some timeouts and connection errors.

Strangely, the line that stopped working after the update is working again, but the other two are still not running properly.

node-red logs.zip

mikakaraila commented 2 years ago

Those environment variables are not activated. I expect you should edit pm2-init.sh and add them there.

There should be this kind of messages in the log:

13:55:59:724 <<<<<< _on_message_received 3 CreateSessionResponse Good (0x00000) 13:55:59:724 >>>>>> 4 ActivateSessionRequest 13:55:59:761 <<<<<< _on_message_received 3 CreateSessionResponse Good (0x00000) 13:55:59:762 >>>>>> 5 ActivateSessionRequest 13:55:59:786 <<<<<< _on_message_received 4 ActivateSessionResponse Good (0x00000) 13:55:59:788 >>>>>> 5 TransferSubscriptionsRequest 13:55:59:803 <<<<<< _on_message_received 5 ActivateSessionResponse Good (0x00000) 13:55:59:804 >>>>>> 6 PublishRequest 13:55:59:805 >>>>>> 7 PublishRequest 13:55:59:805 >>>>>> 8 PublishRequest 13:55:59:805 >>>>>> 9 PublishRequest 13:55:59:805 >>>>>> 10 PublishRequest 13:55:59:806 <<<<<< _on_message_received 5 TransferSubscriptionsResponse Good (0x00000) 13:55:59:807 >>>>>> 6 CreateSubscriptionRequest 13:55:59:830 <<<<<< _on_message_received 6 PublishResponse BadNoSubscription (0x80790000) 13:55:59:833 <<<<<< _on_message_received 6 CreateSubscriptionResponse Good (0x00000) 13:55:59:833 >>>>>> 7 CreateMonitoredItemsRequest 13:55:59:848 <<<<<< _on_message_received 7 PublishResponse BadNoSubscription (0x80790000) 13:55:59:857 <<<<<< _on_message_received 7 CreateMonitoredItemsResponse Good (0x00000) 13:55:59:858 >>>>>> 8 RepublishRequest 13:55:59:864 <<<<<< _on_message_received 8 RepublishResponse BadMessageNotAvailable (0x807b0000) 13:55:59:865 >>>>>> 9 PublishRequest 13:55:59:867 >>>>>> 10 PublishRequest 13:55:59:867 >>>>>> 11 PublishRequest 13:55:59:869 >>>>>> 12 PublishRequest 13:55:59:869 >>>>>> 13 PublishRequest

mikakaraila commented 2 years ago

File pm2-init.sh add these lines:

export NODEOPCUADEBUG="CLIENT{TRACE-REQUEST-RESPONSE}" export NODEOPCUADEBUG="SERVER{TRACE-REQUEST-RESPONSE}" export ARRAY_LENGTH=2000 export NODEOPCUA_DEBUG_MAXLINE_PER_MESSAGE=200

OriolFM commented 2 years ago

There is no file pm2-init.sh in the pm2 folder or in /usr/bin.

I checked the pm2 documentation and found that I need an environment configuration file, and call node-red from there.

Environment variables look like they're there, although it looks like NODEOPCUADEBUG is overwritten because it is assigned two different values. image

I do not see that the logs resemble what you point out, even though the variables look like they're active...

In any case, here is what I get: node-red logs.zip

mikakaraila commented 2 years ago

Hmm for some reason still no more debug output.

As we DEBUG client only this variable value is needed and it must contain "" characters. export NODEOPCUADEBUG="CLIENT{TRACE-REQUEST-RESPONSE}"

One more time...

OriolFM commented 2 years ago

It is a string that must contain the "" characters, or the characters are to define the string?

The pm2 ecosystem file that launches node-red with environment variables is as follows:

module.exports = {
    apps : [{
        script: '/usr/bin/node-red',
        env: {
            NODEOPCUADEBUG : "CLIENT{TRACE-REQUEST-RESPONSE}",
            ARRAY_LENGTH : 2000,
            NODEOPCUA_DEBUG_MAXLINE_PER_MESSAGE : 200
        }
    }],

};

I removed the server one so it will not overwrite the client.

EDIT: Since it didn't look like it was working, I used escape characters in the environment file, now it looks like this:

module.exports = {
    apps : [{
        script: '/usr/bin/node-red',
        env: {
            NODEOPCUADEBUG : "\"CLIENT{TRACE-REQUEST-RESPONSE}\"",
            ARRAY_LENGTH : 2000,
            NODEOPCUA_DEBUG_MAXLINE_PER_MESSAGE : 200
        }
    }],

};

Now the environment variables look like this: image

Logs: node-red logs.zip

mikakaraila commented 2 years ago

Did you restart pm or computer? It is not printing debug outputs.

But this is the error that for some reason client cannot get connection to server: Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Transaction has timed out ( timeout = 60000 ms , request = ReadRequest)"] Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Transaction has timed out ( timeout = 60000 ms , request = ReadRequest)"] Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Connection Break"] Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Invalid Channel BadConnectionClosed"]

After some time client retries to connect and server just does not response (debug will show protocol level messages): Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Invalid Channel BadConnectionClosed"] Schmid Multibond 1: !!!!!!!!!!!!!!!!!!!!!!!! Starting Reconnection !!!!!!!!!!!!!!!!!!! Node: Schmid Multibond 1 Schmid Multibond 1: !!!!!!!!!!!!!!!!!!!!!!!! CONNECTION RE-ESTABLISHED !!!!!!!!!!!!!!!!!!! Node: Schmid Multibond 1 Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Transaction has timed out ( timeout = 60000 ms , request = ReadRequest)"]

OriolFM commented 2 years ago

Did you restart pm or computer? It is not printing debug outputs.

But this is the error that for some reason client cannot get connection to server: Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Transaction has timed out ( timeout = 60000 ms , request = ReadRequest)"] Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Transaction has timed out ( timeout = 60000 ms , request = ReadRequest)"] Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Connection Break"] Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Invalid Channel BadConnectionClosed"]

After some time client retries to connect and server just does not response (debug will show protocol level messages): Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Invalid Channel BadConnectionClosed"] Schmid Multibond 1: !!!!!!!!!!!!!!!!!!!!!!!! Starting Reconnection !!!!!!!!!!!!!!!!!!! Node: Schmid Multibond 1 Schmid Multibond 1: !!!!!!!!!!!!!!!!!!!!!!!! CONNECTION RE-ESTABLISHED !!!!!!!!!!!!!!!!!!! Node: Schmid Multibond 1 Client node error on: Schmid Multibond 1 error: ["Schmid Multibond 1 error at active reading: Transaction has timed out ( timeout = 60000 ms , request = ReadRequest)"]

Yes, I restarted pm2 repeatedly. the environment variables appear on the list, with the quotes (see the screenshot in my last post).

I noticed the client Multibond 1 times out and the connection is broken, but this has nothing to do with the network or the computers. UAExpert can reach the server without a hitch from several computers.

I also tried a new, clean node-red flow from another computer with only this machine and the errors come out exactly the same, with the same behaviour and messages.

I am using the exact same configuration on the OPC-UA client nodes that I use to connect successfully to 6 other machines from the same vendor (Basic256, sign and encript, self-certificate). It is the same config that I use to browse with UAExpert client, and it is displaying the values normally with no reported error, timeout, or disconnection.

The vendor checked, and their server configuration is the same in all their machines (they are using UAExpert server on their windows embedded HMI and not using the Siemens PLC as I initially thought).

I'm officially out of ideas.

mikakaraila commented 2 years ago

Is it same pm2 instance that connects to other servers? If not, can you change working one to connect to one that timeouts? Just to see if it is something on that specific instance (computer) that cannot get connection.

Just to try to isolate is the problem on server side or client side (environment or some external settings).

OriolFM commented 2 years ago

Everything is running on the same pm2 instance, inside a Linux dedicated server, on a single node-red flow. There's nothing else running with it.

When I tested with a clean flow with only the two clients that were giving me trouble, I did it so from a node-red install on a Windows server, that does not use pm2. I wanted to make sure that was a different server on a different subnet, in case there were IP/subnet issues (which shouldn't be, since my IT already discarded that possibility). At the same time, I opened UAExpert on the same testing Windows machine, and it did not have any issues connecting to both troublesome OPC-UA servers, at the same time the node-red test flow declared that the server was not reachable.

mikakaraila commented 2 years ago

Can you read all nodeIds with the UaExpert?

OriolFM commented 2 years ago

Can you read all nodeIds with the UaExpert?

Yes. I can browse them all and I can add them to the Data view to see the current values.

mikakaraila commented 2 years ago

Now I am also out of ideas... only way to get some further information would be to use Wireshark to get network capture to see what client will send to server and then response or nothing from the server as connection timeouts.

Perhaps it is better to setup another computer to run only client that will timeout to make this capture.

OriolFM commented 2 years ago

I'll discuss it with IT next week, thanks :)