gopcua / opcua

Native Go OPC-UA library
MIT License
822 stars 251 forks source link

Possible flaw in autoreconnect logic #598

Open R290 opened 1 year ago

R290 commented 1 year ago

Related to https://github.com/influxdata/telegraf/issues/11559 and possibly related to https://github.com/gopcua/opcua/issues/597 and https://github.com/gopcua/opcua/issues/434.

The errors below are generated using the slightly modified read example here https://github.com/R290/opcua/tree/autoreconnect/examples/read with the docker-compose in the same folder.

The output with debugging disabled:

Starting acquisition
time.Date(2022, time.August, 13, 8, 29, 21, 403337000, time.UTC)
Starting acquisition
time.Date(2022, time.August, 13, 8, 29, 31, 403580000, time.UTC)

When restarting the opcua container:

Starting acquisition
time.Date(2022, time.August, 13, 8, 30, 41, 724244000, time.UTC)
Starting acquisition
Read failed: The operation could not complete because the client is not connected to the server. StatusBadServerNotConnected (0x800D0000)
Starting acquisition
Read failed: The session id is not valid. StatusBadSessionIDInvalid (0x80250000)
Starting acquisition
Read failed: The session id is not valid. StatusBadSessionIDInvalid (0x80250000)
Starting acquisition
Read failed: The session id is not valid. StatusBadSessionIDInvalid (0x80250000)
Starting acquisition
Read failed: The session id is not valid. StatusBadSessionIDInvalid (0x80250000)
Starting acquisition
time.Date(2022, time.August, 13, 8, 31, 41, 715146000, time.UTC)
Starting acquisition
Read failed: The operation could not complete because the client is not connected to the server. StatusBadServerNotConnected (0x800D0000)

It does not recover cleanly.

Normal behavior with debugging enabled:

Starting acquisition
debug: uasc 1/9: send *ua.ReadRequest with 108 bytes
debug: uacp 1: recv MSGF with 86 bytes
debug: uasc 1/9: recv MSGF with 86 bytes
debug: uasc 1/9: recv *ua.ReadResponse
debug: uasc 1/9: sending *ua.ReadResponse to handler
time.Date(2022, time.August, 13, 8, 33, 40, 166392000, time.UTC)

When restarting the opcua container the debug console is flooded witht the following repeated message patten:

debug: client: monitor: disconnected
debug: client: monitor: auto-reconnecting
debug: client: monitor: action: createSecureChannel
debug: uacp 541: close
debug: publish: PublishResponse: {"ResponseHeader":{"Timestamp":"2022-08-13T08:33:58.30056Z","RequestHandle":9,"ServiceResult":2155413504,"ServiceDiagnostics":{"EncodingMask":0,"SymbolicID":0,"NamespaceURI":0,"Locale":0,"LocalizedText":0,"AdditionalInfo":"","InnerStatusCode":0,"InnerDiagnosticInfo":null},"StringTable":null,"AdditionalHeader":{"EncodingMask":0,"TypeID":{"NodeID":"i=0","NamespaceURI":"","ServerIndex":0},"Value":null}},"SubscriptionID":0,"AvailableSequenceNumbers":null,"MoreNotifications":false,"NotificationMessage":{"SequenceNumber":0,"PublishTime":"0001-01-01T00:00:00Z","NotificationData":null},"Results":null,"DiagnosticInfos":null}
debug: publish: error: no subscriptions but the publishing loop is still running: There is no subscription available for this session. StatusBadNoSubscription (0x80790000)
debug: sub: error: There is no subscription available for this session. StatusBadNoSubscription (0x80790000)
debug: sub: pause
debug: sub: pause: pause
debug: uasc 541: readChunk EOF
debug: uasc 541: Close()

Via Wireshark at localhost:3000 we can also see that the server is being swamped with packages (using filter tcp.port == 4840). The reason Nginx is present is so we can simulate a dropped connection as well by restarting the nginx container. Behavior is similar to restarting the opcua server itself.

To me it seems that flooding the server with requests on a dropped connection is definitely not as designed...

Scaredon commented 7 months ago

same issue here. OPCUA Server gets flooded with wrong session ids.

R290 commented 7 months ago

Just checked with v0.5.2 and the flooding of the server no longer occurs. @Scaredon which version are you using? The read example at https://github.com/gopcua/opcua/blob/main/examples/read/read.go has also been modified to make a distinction between different error types.

Scaredon commented 7 months ago

@R290 I've retested today. newest telegraf plugin with v0.5.2 goopcua. Local Build. Autoreconnect is set to true.

  1. start opcuaserver
  2. start telegraf
  3. stop opcuaserver
  4. start opcuaserver

The server is spamed with the sequence (not sure if this is the correct order, log frequency is pretty high):

R290 commented 6 months ago

Sadly, I'm unable to reproduce this. Could you provide any information about the server and authentication method you are using?