locka99 / opcua

A client and server implementation of the OPC UA specification written in Rust
Mozilla Public License 2.0
506 stars 136 forks source link

BadSecurityChecksFailed when checking signature #89

Closed svanharmelen closed 3 years ago

svanharmelen commented 3 years ago

I'm using the client to connect to a 3rd party OPC UA server and for a couple of connections we get a lot of these kinds of errors:

[2021-01-21T20:34:35.905Z][INFO]-[ERROR opcua_client::comms::tcp_transport] Expecting a chunk, got an error message BadSecurityChecksFailed
[2021-01-21T20:34:35.905Z][INFO]-[ERROR opcua_client::comms::tcp_transport] Reader is putting connection into a finished state with status BadSecurityChecksFailed
[2021-01-21T20:34:35.905Z][INFO]-[DEBUG opcua_client::comms::tcp_transport] Reader is sending a quit to the writer
[2021-01-21T20:34:35.905Z][INFO]-[ERROR opcua_client::comms::tcp_transport] Read loop error Kind(ConnectionReset)
[2021-01-21T20:34:35.905Z][INFO]-[INFO  opcua_client::comms::tcp_transport] ReadState has dropped
[2021-01-21T20:34:35.905Z][INFO]-[DEBUG opcua_client::comms::tcp_transport] Read loop ended with an error

A lot means every 30 to 60 seconds (give or take). In between a couple of hundred measurements are successfully received but after some time it gets this error and then reconnects.

Any idea what could be the problem or how I can troubleshoot the issue?

Thanks!

svanharmelen commented 3 years ago

@locka99 and thoughts on this one? Don't expect a solution or definitive answer/reply, but a pointer of some context would be very helpful to get my started... Thanks!

svanharmelen commented 3 years ago

In another connection (to a different 3rd party OPC UA server) I also see these errors every once and a while:

[2021-01-25T08:41:48.932Z][INFO]-[ERROR opcua_crypto::security_policy] Signature invalid [209, 60, 212, 102, 88, 21, 13, 153, 31, 45, 129, 235, 139, 102, 154, 126, 121, 206, 63, 227, 79, 94, 86, 172, 93, 123, 182, 9, 51, 203, 18, 220]

Can these be related? And again, any place where I should start debugging these? Thanks!

locka99 commented 3 years ago

If you have wireshark you can try capturing some traffic and see if there is anything obvious about the packet that would make it fail. I had put a link to some wireshark info in the developer section before Christmas:

https://github.com/locka99/opcua/blob/fb76c892150fec1bf477f827e8f6d1395c5f55a7/docs/developer.md

Basically Wireshark has an OPC UA protocol debugger in it that you can use. It only shows everything when the connection is plain text but you should see header info even when the payload is encrypted and you might spot something odd happening. Before you do anything with Wireshark you have to edit the settings for the OPC UA to set the port # to the one you connect to the server with the same as the one you are communicating with the server and then you can start to capture traffic.

Also, you can enable TRACE level debugging in the OPC UA for Rust lib. How you do that depends on what logger you're using but for the console logger you'd set the RUST_OPCUA_LOG=trace environment variable. If you are using log4rs you can checkout what the demo-server does to enable it

svanharmelen commented 3 years ago

Thanks so much for the pointers @locka99! I’m going to give it a shot tomorrow morning. Will let you know what I can find...

svanharmelen commented 3 years ago

Thanks again for the pointers as it helped me to understand what is going on... It turns out that the server I am connecting to (some embedded 3rd party OPC implementation) is not using the correct time (it's about ~6 minutes behind). So the client fails to create a secure channel as the responses seem to be ignored. I didn't actually verify that they are indeed simply ignored, but the responses aren't processed and the client keeps sending OpenSecureChannel messages.

Now I updated the timeout_hint send to the server as a way to circumvent the problem, but then I got the BadSecurityCheck error again. So I'm now wondering if they could be caused by the time difference? It seems to be "fixed" when I update the clock on the client host to be in sync with the server, so I'm guessing time plays a role, just not sure which one...

I would like our client to be super robust, so I was thinking about keeping track of the server time since we receive that back from the server. Maybe as a real timestamp but more likely as an offset/diff that can be used to set a relative timestamp when answering to the server. But that would only work if the encryption can also use that same relative timestamp. Any idea if something like that is possible?

Not (yet) asking if you would be open to merge a PR adding such logic, but first trying to find a solution and see what it would look like. Anything that allows us to create a working connection without having to update or change anything on the server would be an acceptable solution for this use case.

svanharmelen commented 3 years ago

@locka99 so I got around to testing this and it actually worked like a charm. So that means that if we are somehow able to use a time offset in the client, we should be able to allow the client to just "follow" the server. A bit inline with what is documented in the OPC UA docs here.

I tested it really, really nasty by applying a hard coded fixed offset (of 400 seconds) in the encode and decode methods of the RequestHeader, ResponseHeader and ChannelSecurityToken types. Updating these was enough to fix the problem I encountered with the 3rd party OPC UA server. So I was able to create a secure channel, a subscription and monitor several items without issues after changing the encoding/decoding for these types.

Of course this is not a good solution going forward, but the question is what would be the best place to add something in order to make this possible?

One solution could be to add an offset parameter to the encode and decode methods of the BinaryEncoder trait and then update all calls to that so it always passes an offset (inline with how DecodingLimits is already used) which would always be zero, unless a client config option is enabled that sets the offset to the time diff between the client and the server. Of course the offset could also be placed in a DecodeOptions struct, so that any future options (if any) do not require yet another method signature change.

Another solution could be to not use DateTime::now() in the client when needing the current time, but a custom function which returns the current time with an applied offset. Again, mostly being zero unless a client config option is enabled.

While the latter may seem cleaner, a lot of code is shared between the server and the client and so it would probably not be enough to "only" update the DateTime::now() calls in the client package, but it would probably require all the calls to DateTime::now() to be updated.

So I can see how doing it during the encoding/decoding could actually be quite a nice solution, also because then you don't have to think about it anywhere in the rest of the code and don't need to use custom functions for handling or getting the time. I can definitely try to make a PR for this so we can see what that would look like, but it would be very nice to first know if you are open to this approach... As otherwise I can have a look the other option (which also seems feasible) or at whatever other idea you may have instead.

Of course please also let me know if you don't want to add anything like this at all. As in that case we will need to make a fork anyway, so then we can just choose whatever solution we like best ourselves. Thanks again and curious what your thoughts are!

svanharmelen commented 3 years ago

Had some time to play with it today and noticed that all the DateTime::now() uses, are already using a custom version declared here, so that actually makes things a lot easier... I'll try to make a PR to see what it would look like.