It takes too long time to send a request

locka99 / opcua

A client and server implementation of the OPC UA specification written in Rust

Mozilla Public License 2.0

504 stars 135 forks source link

It takes too long time to send a request #40

Open nkbai opened 4 years ago

nkbai commented 4 years ago

reproduce step

connect to an endpoint
create a subscribtion
add 1000 item to monitor
session::run_async in the background
sleep 5 seconds
call Session::read to read any node
get the value of the node.

session takes 3 or more seconds to send the ReadRequest ( use wireshark to capture). it seems there are some locks doesn't release .

if step 6 and 7 is before step 4, the read returns very fast.

locka99 commented 4 years ago

All of the client's session API is synchronous so if you call create_monitored_items then the code will wait for the server's response before the next command is processed.

Internally, the session does asynchronously place requests into a queue but I haven't exposed that functionality yet.

nkbai commented 4 years ago

it takes more than 3 seconds to send out the request,not receive the response, for example: I call read at 3:00.000, the request was sent at 3:03.120(120 means 120ms), and I receive read response at 3:03.700 .

locka99 commented 4 years ago

I created a test in opcua/integration that subscribes to 1000 variables from a client to a server and the entire duration of create_subscription() and create_monitored_items() is only 50ms on my PC.

$ cd opcua/integration
$ cargo test -- --exact --test-threads=1 --ignored tests::subscribe_1000

Could there be something else your software is doing with locks on the session, e.g. via a timer which is making it run so slow?

locka99 commented 4 years ago

The test uses an encrypted endpoint too to ensure that it isn't caused by that

nkbai commented 4 years ago

what I mean is that it use too much time to send any request after session start. you must test read or write a variable after Session::run_async()

locka99 commented 4 years ago

ok I'll check though note the code will open a session to call getendpoints and then open a second session using the desired matching endpoint description and security credentials.

locka99 commented 4 years ago

I've reworked Session::run_loop() and poll(). The loop used to call Session::poll() with the session locked but poll() would thread sleep for 50ms if there was no actions to do. So potentially the loop would lock the session for no reason for 50ms. I've made it so it locks, polls and releases and then sleeps if there was no activity. That should mean if you're doing stuff from another thread that it shouldn't have to wait on the lock for so long.

I don't know if this is the cause of the slowdown but it wasn't good to hold the lock during a sleep so it's an improvement in any case. But let me know if it fixes your issue.

nkbai commented 4 years ago

maybe it's because of the server is too slow. there is too many publish requests on fly which are waiting for publish response. and it blocked the thread.

session_state.set_wait_for_publish_response(wait_for_publish_response);

once the wait_for_publish_response is setting to false,it is never changed to true. Is this correct?

locka99 commented 4 years ago

yes that looks like a bug. I don't know if it would cause a slow down because the publish requests get sent on a timer so the worst that would happen is extra publish requests. IIRC the heuristic used by the client is to send publish requests for each subscription at the publishing interval using a timer. The server can either queue them up or send a too many publish requests in which case we stop until we receive responses from the previous requests. That was what the flag was supposed to be for.

I'll fix it to set it to true, but you might want to test that change locally and see if it resolves your issue. I thought it more likely that the change I did to tighten locking code which could inadvertantly hold a lock during a sleep was a more likely cause of the issue.

nkbai commented 4 years ago

it may be because of you use std::sync::Mutex in the tokio environment. you can use tokio:sync:Mutex instead of std::sync::Mutex, otherwise call lock will block the tokio exectuor.