mikakaraila / node-red-contrib-opcua

A Node-RED node to communicate OPC UA. Uses node-opcua library.
Other
211 stars 196 forks source link

Error: "Invalid Channel BadConnectionClosed" #371

Closed OriolFM closed 2 years ago

OriolFM commented 2 years ago

I have been developing a system to read information from some production lines using OPC-UA.

After a bit of a rough start, my system currently includes 4 lines that are working without incidents.

I'm trying to include two more lines from the same vendor in the system, they use the same type of configuration and some of the modules in the machines are identical:

image

However, one of the lines connects in the beginning but later it times out, giving some errors. At the same time, UAExpert does not report any network issues that would justify the connection problems, and just keeps showing the normal parameters.

With the last line is even worse: it does not connect, it reports that the connection attempt has been rejected by server/timed out. However, UAExpert can access and display the values I need to read without any issues, and the configuration data is exactly the same.

image

In addition to that, I do have a catch error node in the same tab, but it does not seem to work with the aforementioned errors.

Any ideas on what would cause that?

mikakaraila commented 2 years ago

Couple of things could be still tested: 1) Define own folder for certificate store for each client as now all the certificates are in the same folder Like client1 for the first client etc. 2) Run node-red without verbose mode will cut of debugging messages if it slows program execution

mikakaraila commented 2 years ago

You can contact to get official support: Etienne Rossignon etienne.rossignon@sterfive.com

mikakaraila commented 2 years ago

Can you test "empty" flow I mean that just client nodes without anything else? Will they go to connected, session active state?

If yes then problem is perhaps that the flow injects blocking read actions so many that all client nodes cannot connect. I mean for example each client with readmultiple takes 10s to get values and 6 x readmultiple = 60s thus client number 7 cannot get connected as it will timeout.

If this is the case then flow should not run all readmultiple in one sequence or timeout should be adjusted bigger.

Test this kind of flow first and then add readmultiple until client(s) cannot connect: image

And you should check how long it takes to read multiple items: image

In this case reading one item took just 10ms, but response time depends on server & load etc. it is better to measure: image

OriolFM commented 2 years ago

I tested the empty flow from the Windows machine, the results seem similar to the other flow, one line can connect, the other can't: image

EDIT: The client that can connect also suffers timeouts despite having nothing in it: image

This is the debug output: image

After all morning, the verbose output in the node-red server looks like this: image

I talked with IT and we'll do the Wireshark test tomorrow morning.

mikakaraila commented 2 years ago

For client1 as there is no messages, it will timeout and keepalive will re-connect it again.

But as client2 will not connect have you tested from the pm2 computer ping to server and from server back to pm2? Just additional check that computers will see each other both directions.

mikakaraila commented 2 years ago

For the Wireshark capture don´t use encrypt, None/None or Sign then it is possible to read messages.

mikakaraila commented 2 years ago

Any news or updates?

OriolFM commented 2 years ago

I haven't been able to try the wireshark yet (IT is busy updating our SAP servers, it'll take a few days).

However, I think that after several reboots and retries, now the logs are showing the debug info you were requesting. I disabled all the clients that were working properly in the flows, and only left the two that are malfunctioning.

Attached are the logs. node-red logs.zip

mikakaraila commented 2 years ago

Could it be that the hostname & IP-address doesn´t match? Hostname = opc.tcp://n26-707973:4840/

08:11:39.873Z :client_secure_channel_layer :935 { /GetEndpointsResponse/ ... responseHeader / ResponseHeader /: { ... timestamp / DateTime /: 2022-01-27T08:09:08.031Z ... requestHandle / UInt32 /: 2 0x2 ... serviceResult / StatusCode /: Good (0x00000) ... serviceDiagnostics / DiagnosticInfo /: { /DiagnosticInfo/ ... namespaceUri / Int32 /: null ... symbolicId / Int32 /: -1 ... locale / Int32 /: -1 ... localizedText / Int32 /: -1 ... additionalInfo / String /: null ... innerStatusCode / StatusCode /: Good (0x00000) ... innerDiagnosticInfo / DiagnosticInfo /: null ... }; ... stringTable / UAString [] /: [ / empty/ ] ... additionalHeader / ExtensionObject /: null ... } ... endpoints / EndpointDescripti[] /: [ ... { /0/ ... endpointUrl / UAString /: opc.tcp://n26-707973:4840/ ... server / ApplicationDescri /: { ... applicationUri / UAString /: urn:n26-707973:Schmid:SchmidOpcUaServer ... productUri / UAString /: urn:Schmid:SchmidOpcUaServer ... applicationName / LocalizedText /: locale=null text=SCHMID OPC-UA Server[N26-707973] ... applicationType / ApplicationType /: ApplicationType.Server ( 0) ... gatewayServerUri / UAString /: null ... discoveryProfileUri / UAString /: null ... discoveryUrls / UAString [] /: [ / length =1/ ... opc.tcp://N26-707973:4840/ ... ] ... } ... serverCertificate / ByteString / ... Buffer: ... 308204963082037ea003020102021100...c57983675544f02b2269511857c4dcda ... securityMode / MessageSecurityMo /: MessageSecurityMode.SignAndEncrypt ( 3) ... securityPolicyUri / UAString /: http://opcfoundation.org/UA/SecurityPolicy#Basic256Sha256 ... userIdentityTokens / UserTokenPolicy [] /: [ ... { /0/ ... policyId / UAString /: 0 ... tokenType / UserTokenType /: UserTokenType.Anonymous ( 0) ... issuedTokenType / UAString /: null ... issuerEndpointUrl / UAString /: null ... securityPolicyUri / UAString /: null ... }, ... { /1/ ... policyId / UAString /: 1 ... tokenType / UserTokenType /: UserTokenType.UserName ( 1) ... issuedTokenType / UAString /: null ... issuerEndpointUrl / UAString /: null ... securityPolicyUri / UAString /: null ... }, ... { /2/ ... policyId / UAString /: 2 ... tokenType / UserTokenType /: UserTokenType.Certificate ( 2) ... issuedTokenType / UAString /: null ... issuerEndpointUrl / UAString /: null ... securityPolicyUri / UAString /: null ... } ... ] ... transportProfileUri / UAString /: http://opcfoundation.org/UA-Profile/Transport/uatcp-uasc-uabinary ... securityLevel / Byte /: 3 ... }, ... { /1/ ... endpointUrl / UAString /: opc.tcp://n26-707973:4840/ ... server / ApplicationDescri /: { ... applicationUri / UAString /: urn:n26-707973:Schmid:SchmidOpcUaServer ... productUri / UAString /: urn:Schmid:SchmidOpcUaServer ... applicationName / LocalizedText /: locale=null text=SCHMID OPC-UA Server[N26-707973] ... applicationType / ApplicationType /: ApplicationType.Server ( 0) ... gatewayServerUri / UAString /: null ... discoveryProfileUri / UAString /: null ... discoveryUrls / UAString [] /: [ / length =1/ ... opc.tcp://N26-707973:4840/ ... ] ... } ... serverCertificate / ByteString / ... Buffer: ... 308204963082037ea003020102021100...c57983675544f02b2269511857c4dcda ... securityMode / MessageSecurityMo /: MessageSecurityMode.SignAndEncrypt ( 3) ... securityPolicyUri / UAString /: http://opcfoundation.org/UA/SecurityPolicy#Basic256 ... userIdentityTokens / UserTokenPolicy [] /: [ ... { /0/ ... policyId / UAString /: 0 ... tokenType / UserTokenType /: UserTokenType.Anonymous ( 0) ... issuedTokenType / UAString /: null ... issuerEndpointUrl / UAString /: null ... securityPolicyUri / UAString /: null ... }, ... { /1/ ... policyId / UAString /: 1 ... tokenType / UserTokenType /: UserTokenType.UserName ( 1) ... issuedTokenType / UAString /: null ... issuerEndpointUrl / UAString /: null ... securityPolicyUri / UAString /: null ... }, ... { /2/ ... policyId / UAString /: 2 ... tokenType / UserTokenType /: UserTokenType.Certificate ( 2) ... issuedTokenType / UAString /: null ... issuerEndpointUrl / UAString /: null ... securityPolicyUri / UAString /: null ... } ... ] ... transportProfileUri / UAString /: http://opcfoundation.org/UA-Profile/Transport/uatcp-uasc-uabinary ... securityLevel / Byte /: 2 ... } ... ] ... }; 08:11:39:875 >>>>>> ------ C 1 3322 3 GetEndpointsRequest s= 68

OriolFM commented 2 years ago

Could it be that the hostname & IP-address doesn´t match? Hostname = opc.tcp://n26-707973:4840/

In node-red I'm not using any hostname, I use only the IP address because as far as I know, that hostname is just for OPC-UA denomination, not for network routing. When I connect from UAExpert, I also add the server with IP.

In the six other machines from the same vendor that are working, I do so in the same exact way, and I didn't have any problem.

Of course, if I try to ping that hostname in our network it says it does not exist. It is strange that it is looking for that hostname and not the IP that I typed into the configuration field.

mikakaraila commented 2 years ago

Certificate uses hostname thus it is in the endpoint, but normally IP should be coming from DNS as hostname. Could it cause connection problem? Can you ping other servers with IP / hostname?

OriolFM commented 2 years ago

I can't ping them using those hostnames.

I noticed when I connect from UAExpert, it encounters some error, and then replaces the hostname by the IP address to reach the endpoint:

image

mikakaraila commented 2 years ago

Ok is similar happening with working ones? If not then you are closer actual root cause for the failing connection.

I expect it has something to do with DNS or DNS cache... ask IT.

OriolFM commented 2 years ago

This is from 2 other machines that work. The output is different. image

It does say the hostname does not match the IP (normal, since we don't solve hostnames in the intranet, we work just with IPs)

Both machines work in node-red without a hitch.

OriolFM commented 2 years ago

I did a test: I added the OPC-UA hostnames into the node-red server VM hosts file, so now I can ping them successfully: image

But still, there seems to have some issues either retrieving the certificate or with the username/password (according to the node-red output).

If I try to connect from UAExpert, it points out that it is replacing the hostname with the IP in all nodes to be able to access them (and it works). It also points out that the certificate does not match the hostname (presumably because the certificate is issued for the IP and the OPC-UA server uses the hostname).

Does the opc-ua node have a similar fallback to replace the hostname for the IP if the connection fails?

mikakaraila commented 2 years ago

Node-red output would help me... otherwise I have to guess what is actual problem.

But you can generate with openssl certificates with extra information. Now installation makes it silently on the background. node-opcua-pki contains some commands and examples how to generate / study certificates.

But as I don´t know exact problem / error I cannot give more instructions.

mikakaraila commented 2 years ago

Any updates or log from the last error?

mikakaraila commented 2 years ago

Closed as no more information available.

OriolFM commented 2 years ago

Sorry, I didn't have much time to spend on that (more pressing issues at work I had to work on, and couldn't receive support from IT for the wireShark).

I still have the same problem. All 9 machines from this vendor seem to be running the same OPC-UA server version with the same configuration.

image

We think that maybe the unstable behaviour (the timeouts) could be hardware related, the computers where the OPC-UA servers run have Windows 7 embedded and run on 4 GB of RAM and regular HDD. The ones having errors are over 85-90% RAM usage all the time, whereas the rest are about 45-50%. We will increase those to 8GB of RAM and change the HDD to SSD, and see if the behaviour persists for the timeouts.

We still have no idea where the connection errors come from.

mikakaraila commented 2 years ago

It is clear in the error message: Hostname or IP address does not match the hostname or IP address the client connected...

OriolFM commented 2 years ago

I don't understand. I configure a machine IP and I get the data from that machine. How can it be that the IP is another one?

mikakaraila commented 2 years ago

If I remember correctly you have some mismatch with DNS & IP addresses. Those can cause problem with certificate. It depends on hostname / IP and DNS.

OriolFM commented 2 years ago

I finally found the solution to the issues.

  1. The machines that did not connect were not properly configured. Normally, each machine has a small IOT gateway at the input that forwards certain ports. The vendor configured the server to create the certificates with their local IP (inside their gateway) and not with the factory IP. They told me how to change their setup file so it would also include the IP from the outside, and that seemed to work.

  2. Now for the tricky one: the regular timeouts. Turns out that the PC time was ALMOST right, and was out of sync by several seconds (less than a minute, but close to it), because it was trying to sync to time.windows.com and not with the network gateway (the machine is not connected to the internet). When I put a read request, plus the additional seconds from the out-of-sync timestamps, went over the timeout value, and reported a timeout even though it really wasn't.

In any case, thank you very much for the help with this issue, sorry for the annoyance, and thank you again for the nodes. They're very useful.

mikakaraila commented 2 years ago

Good that you find root causes. Those were really tricky ones.

Mbonss commented 2 years ago

I am currently experiencing similar issues (time-outs and unable to connect)

Can you tell me in what way to change the setup file? Is this also where you put the settings for the time synchronisation of your machine?

OriolFM commented 2 years ago

Can you tell me in what way to change the setup file? This had to do with how the hardware and the OPC-UA server are configured for that specific vendor. They have the OPC-UA server behind a gateway that redirects the port traffic inside the network, and the PC would generate the certificates with only the local IP, not the external IP from the gateway. The vendor pointed out they had a specific .ini file in the config folder of the OPC-UA server directory, and I had to add the following line at the end, where "xxx.xxx.xxx.xxx" is the external IP for the machine:

[AdditionalCNames]
AdditionalCNames = xxx.xxx.xxx.xxx

After rebooting the PC, the OPC-UA server loaded that additional parameter and that solved the problem for me.

Before trying to modify any setup files, I think you should contact the vendor to see if it will work with their server.

Is this also where you put the settings for the time synchronisation of your machine?

No, that's just plain old Windows Embedded Date & Time settings.

Go to Control Panel > Date & Time > Internet > Change Settings, then select the same time server you use for the node-red computer. Then click on "update now" to do the sync. The default time syncing is set at one week, but if you want you can do it more often by editing the register.

In my case, the PC's motherboard battery for the clock was low, and after power went off during a scheduled maintenance, the clock had slipped. That difference in timestamp produced constant timeout errors (or completely prevented the connection).