locka99 / opcua

A client and server implementation of the OPC UA specification written in Rust
Mozilla Public License 2.0
476 stars 129 forks source link

Client Session/Tcptransport clones of session_state become out of sync if the server restarts #243

Open joshuagleaton opened 1 year ago

joshuagleaton commented 1 year ago

When a client is connected to a peer, if the peer disconnects, the reconnect logic calls 'reconnect_and_activate', which calls 'reset()', which creates a new session_state, but the session.transport is not updated with the new session state, leading to inconsistencies.

I have a patch for this here. By adding a Tcptransport::reset() that updates the session_state (and the dependent message_queue and connection_state.

The bug can be demonstrated by commenting out session.rs:223. Running simple-client against simple-server, with line 223 commented, when restarting the server, the client will panic:

Created a subscription with id = 1
Data change from server:
Item "ns=2;s=v4", Value = Double(-0.893559520249049)
Item "ns=2;s=v1", Value = Int32(28)
Item "ns=2;s=v2", Value = Boolean(true)
Item "ns=2;s=v3", Value = String(UAString { value: Some("Hello World times 1") })
Data change from server:
Item "ns=2;s=v2", Value = Boolean(false)
Item "ns=2;s=v4", Value = Double(0.21200710992205463)
Item "ns=2;s=v3", Value = String(UAString { value: Some("Hello World times 2") })
Item "ns=2;s=v1", Value = Int32(35)
Data change from server:
Item "ns=2;s=v2", Value = Boolean(true)
Item "ns=2;s=v1", Value = Int32(42)
Item "ns=2;s=v4", Value = Double(0.9950138797069145)
Item "ns=2;s=v3", Value = String(UAString { value: Some("Hello World times 3") })
thread 'main' panicked at 'assertion failed: self.session_state.read().id() == self.transport.session_id()', lib/src/client/session/session.rs:225:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

While this patch seems to work, it exposes a large issue. As the reconnect logic progresses, it eventually deadlocks on session.rs:1442 : SessionService::create_session::spawn_session_activity_task, which tries to access the runtime, but cannot because the TcpTransport::connect has spawned a thread that locks the runtime during the life of the connection (spawn_looping_tasks). For that reason I currently have the transport reset disabled so that the client can be restarted externally.

If there are suggestions on how to work around this I'm happy to test.

joshuagleaton commented 1 year ago

possibly related to #189