locka99 / opcua

A client and server implementation of the OPC UA specification written in Rust
Mozilla Public License 2.0
476 stars 129 forks source link

Fix memory leaks in connection management code #201

Closed lovasoa closed 2 years ago

lovasoa commented 2 years ago

This revamps the connection handling on both the client and the server to guarantee that we never have rogue tokio tasks running.

This simplifies the connection management code by using native rust and tokio future cancelling to make sure all concurrent code stops when a connection dies, whatever the reason.

This allows removing a lot of complex connection state management code.

In the new design, if either the read or the write loop needs to terminate the connection, it can do so by simply returning from the current function.

Fixes #199 Fixes #179

This is a base for future pull requests, that I am afraid to do with the current complex state management code.

lovasoa commented 2 years ago

@locka99 : let me know if there is anything I overlooked. In addition to fixing logic bugs, this PR hopefully makes the code shorter and hopefully easier to contribute to in the future, and slightly improves performance, memory usage, and power usage because of the removed tokio tasks.

ctron commented 2 years ago

I just tried it out, and to me it looks like the reconnnect isn't properly working:

  1. start OPC UA server (not from this crate)
  2. start OPC UA client (built with this crate)
  3. restart OPC UA server

In that case, the client disconnects (detects the disconnection) and at some point tries to re-connect. But get's stuck with that:

[2022-05-06T11:03:28Z INFO  opcua::client::session_retry_policy] Retry retriggered by policy
[2022-05-06T11:03:28Z INFO  opcua::client::session::session] Retrying to reconnect to server...
[2022-05-06T11:03:28Z INFO  opcua::client::session::session_state] SessionState has dropped
[2022-05-06T11:03:28Z INFO  opcua::client::session::session] Connect
[2022-05-06T11:03:28Z INFO  opcua::client::session::session] Security policy = None
[2022-05-06T11:03:28Z INFO  opcua::client::session::session] Security mode = None
[2022-05-06T11:03:28Z ERROR opcua::client::comms::tcp_transport] Connected failed with status BadCommunicationError
[2022-05-06T11:03:28Z WARN  opcua::client::session::session] session:4 Reconnect was unsuccessful, retries = 2
ctron commented 2 years ago

Just to be clear. Although the OPC UA server is up again, it get's stuck in the state show above.

lovasoa commented 2 years ago

@ctron : I simplified connection management on the client, and fixed a reconnection issue. Can you try again ? You should also notice a performance increase and faster connections, due to less blocking and polling tasks.

If you experience more issues, could you provide the most verbose logs (with RUST_OPCUA_LOG=trace) to help with debugging ?

lovasoa commented 2 years ago

@locka99 : I know this getting a little big (with many more deletions than additions, but still). Would you be interested in a call to go over it together ?

image

locka99 commented 2 years ago

The integration tests are failing with timeouts - it might be worth seeing if the underlying reason has anything to do with this patch or something else

lovasoa commented 2 years ago

@locka99 : There was a potential deadlock on session_state. I fixed it.