inejge / ldap3

A pure-Rust LDAP library using the Tokio stack
Apache License 2.0
220 stars 38 forks source link

Avoid spawning drive() #105

Closed mdecimus closed 1 year ago

mdecimus commented 1 year ago

Hi,

Is there a way to avoid spawning the drive function each time an LDAP query needs to be done? I tried manually calling turn(LoopMode::SingleOp) after sending a request but the request gets stuck.

I understand the reasons you explained in https://github.com/inejge/ldap3/issues/53 but, since this is a client library, I believe that the async version should work just like the sync one and wait until the result is available. Spawning a future would make sense if there are events being pushed from the server which I don't believe is the case in LDAP.

Thanks!

inejge commented 1 year ago

Is there a way to avoid spawning the drive function each time an LDAP query needs to be done?

Spawning the drive! task is done once per connection, not once per query. It's done in order to implement the multiplexed nature of an LDAP connection.

I believe that the async version should work just like the sync one and wait until the result is available. Spawning a future would make sense if there are events being pushed from the server which I don't believe is the case in LDAP.

If you look at sync code, you'll see that it uses the async elements internally, including the spawning of the drive! task once the connection is established.

As for server-initiated messages, they exist: RFC 4511, Section 4.4, describes unsolicited notifications which can be sent independently of any client-initiated exchange.

mdecimus commented 1 year ago

Thank you for the reply.

I totally understand and I agree that spawning drive is the right design choice in order to handle multiplexed responses. However, I believe that spawning a task, even if it is per connection, may introduce an overhead for simpler tasks such as conducting a single query.

Given the significant variation in the scale and complexity of tasks for which this library might be used, I would like to propose an enhancement to make spawning the drive task optional when multiplexed messages or server-initiated messages are expected. I believe it would be a valuable addition for developers who are using this library for applications where performance is critical.

I understand that this might involve a considerable amount of work, but I believe that this enhancement is worthwhile as it would greatly increase the library's flexibility and adaptability to different use case scenarios.

I'd be more than happy to hear your thoughts on this and would be willing to assist in any way that I can, if this is something you'd be interested in pursuing.

Thanks again for the library, and I'm looking forward to hearing your thoughts on this suggestion.

inejge commented 1 year ago

If we're discussing adaptability, let's first agree on some lower limits to complexity: this library is decidedly std-only, therefore a process will have at least a couple of MB for code and heap space. With that in mind:

I would like to propose an enhancement to make spawning the drive task optional when multiplexed messages or server-initiated messages are expected.

[Not expected, surely?] If you forgo multiplexing and keep Tokio as the async executor, the savings are meagre: a few KB because of simpler I/O code (the rest of decoding machinery must still exist), a few more KB because you don't need the connection-wide message-id to channel hashmap.

I suppose that using a custom bare-bones executor would produce a more significant simplification (but not if you want to keep TLS!) at the cost of rearchitecting the library to be more executor-agnostic. I'm not prepared to go there, not until it's clearer what kind of abstractions some future iteration of async Rust is going to provide.

With those constraints, I don't think that trying to simplify I/O interaction is worthwhile.

mdecimus commented 1 year ago

[Not expected, surely?] If you forgo multiplexing and keep Tokio as the async executor, the savings are meagre: a few KB because of simpler I/O code (the rest of decoding machinery must still exist), a few more KB because you don't need the connection-wide message-id to channel hashmap.

Even without benchmarking the code I am quite certain that writing and reading from a socket will always be faster than splitting the socket, spawning a task and exchanging messages through channels. I am not talking about memory usage, this is about performance.

If you look at other async libraries on crates.io it is extremely rare that a client library requires spawning a task for something trivial such as performing a query. For example, IMAP is also a multiplexed protocol with server initiated messages but async libraries are not forcing users to spawn a task when they to do something simple like reading a single email.

inejge commented 1 year ago

Even without benchmarking the code I am quite certain that writing and reading from a socket will always be faster than splitting the socket, spawning a task and exchanging messages through channels. I am not talking about memory usage, this is about performance.

I expect that network round-trip latency would dominate in the response time, especially for simple searches where the server has everything indexed and RAM-resident. Benchmarks would be the preferred way to establish a performance baseline. E.g., user/kernel CPU usage when a gigabit connection is saturated, or the request load is kept steady at 100k RPS, for the expected operation mix. Then you can substitute a simplified I/O driver and retest for the same target(s). Alternatively, performance targets stemming from design requirements could be used.

I don't think I would seriously consider reworking the I/O architecture without seeing convincing evidence of performance improvement. Even that mustn't come at the expense of seriously affecting backwards compatibility and API ergonomics.

mdecimus commented 1 year ago

I expect that network round-trip latency would dominate in the response time, especially for simple searches where the server has everything indexed and RAM-resident.

That is true, but what I mean is that all that extra work the library is doing to read a response is taking resources from other processes running in the program. For a small amount of requests this won't matter of course but if, like in my use case, you need to handle thousands of requests per second this adds up and ends up impacting performance. Recently I found a Rust library that was cloning String's all over the place rather than passing &str references, and of course the performance difference is negligible but in a high-throughput environment this does have an impact and degrades performance. If one keeps adding this small inefficiencies all over the place then everything slows down.

I don't think I would seriously consider reworking the I/O architecture without seeing convincing evidence of performance improvement. Even that mustn't come at the expense of seriously affecting backwards compatibility and API ergonomics.

No worries, I'll just fork the library and simplify the API for my use case.