Open wyfo opened 2 years ago
Kind of duplicate of #475, but implementations are quite different
Hey, I was about to open an issue around the fact that &'static str
is the default case in most instances, or at least it should be (since if you're building queries with format strings you're potentially going to have injections), and therefor Query should accept a &str. This is particularly nice if you're calling the same query many times but with different parameters.
I also agree that the cache should hold Arc<PreparedStatement>
to make it cheaper to clone.
I have a couple of notes on your code: https://github.com/wyfo/scylla-rust-driver/blob/optimize_caching_session/scylla/src/transport/session.rs#L595
prepare
is a bit unfortunate in that it takes impl Into<Query>
but never actually needs ownership of Query. That's because Connection::send_request
borrows the request. So at no point is an owned buffer actually necessary at all.
The main benefit of taking impl Into<Query>
is that it lets you provide static strings, but it means that every called to impl Into<Query>
is forcing an allocation. That's still happening with the approach you've taken.
This is why I went down the route of making impl Into<Query>
cheap. Unfortunately, I've lost that code. Whoops.
I'd like to hear from the Scylla devs on this, if there's any interest or not. I think there are a few things to do here:
Moving PreparedStatement into Arc should likely be its own PR, it should be a very straightforward win. There is a clone of PreparedStatement for every add_prepared_statement
call, whether it was previously prepared or not. PreparedStatement
is a very large structure that itself has many pointers, including Arc<dyn _>
pointers, so this should just strictly be a win.
Determining what amount of API breakage, if any, is acceptable. In the approach I took I had a Cow<'a, String>
in order to minimize API breakage. I think optimizing for the &'static
or &'a
cases makes the most sense for two reasons.
The changes to to avoid those clones and copies would be:
Query
to Query<'a>
Session::prepare
to not require ownership of the Query. This will help in the case where a String
is used as the buffer.I took a quick pass at redoing the work. https://github.com/scylladb/scylla-rust-driver/compare/main...colin-grapl:scylla-rust-driver:remove-allocations
This is a breaking API change. In particular, if you were calling .into()
on queries before passing them into execute
, you have to not do that. I have also added lifetimes to a number of structs.
CachingSession
is currently suboptimal, especially when using it&'static str
(and I suppose it's the main use case), mostly for two reasons:Into<Query>
implies a string cloning in case of&str
; however, making the query is not necessary to perform cache lookup, only the string reference is necessaryPreparedStatement
s, which have to be cloned when retrieved (andPreparedStatement
is quite heavy to clone).I've implemented an optimization to solve this issue, in which I've made the following modifications:
trait QueryString {fn query_string(&self) -> &str;}
and usequery: impl Into<Query> + QueryString
inCachingSession::Execute
(and others) signature; cache lookups are performed usingquery.query_string()
, andquery.into()
is only used when the statement must be preparedArc<PreparedStatement>
into the cacheI've done some benchmarks (using my old i7 4650U MacBook Air), in which I've measured only one prepared statement retrieval (no serialization and database access) from
&'static str
. Here are the results :I let you judge if the optimization could be worth it. The implementation is just a draft and maybe adapted (I can open a PR).