will / crystal-pg

a postgres driver for crystal
BSD 3-Clause "New" or "Revised" License
462 stars 77 forks source link

Add pipeline support #253

Open jgaskins opened 2 years ago

jgaskins commented 2 years ago

This is a rough sketch for query pipelining support. Pipelined queries in most DBs are implemented as:

  1. Do all the serialization of the queries, including parameters, over the wire
  2. Do all the reading and deserialization of results from the wire

Benchmark

Time spent on 100k SELECT 42 queries (lower is better):

                user     system      total        real
serial      0.172153   0.238778   0.410931 (  1.462518)
pipelined   0.084153   0.089424   0.173577 (  0.649260)

That's 55% less wall-clock time and 58% less CPU time with localhost latency. For databases where latency might be in the 1-2ms range, the difference will probably be orders of magnitude. In fact, it might've been a more dramatic difference but I had to split the pipelines into chunks — see "known issues" below.

Usage

result_sets = pg.pipeline do |pipe|
  pipe.query "SELECT 42"
  pipe.query "SELECT * FROM posts LIMIT $1", limit
end

result_sets.scalar(Int32) # => 42
result_sets.read_all(Post) # => [Post(@id=1, ...), Post(@id=2, ...), ...]

Via crystal-lang/crystal-db:

db = DB.open("postgres:///")
db.using_connection do |connection|
  result_sets = connection.pipeline do |pipe|
    # ...
  end

  result_sets.read_one(...)
end

The API I had in mind with this is that query_one, query_all, and query_each become query inside the pipeline block and read_one, read_all, and read_each when you're consuming results.

Known issues/Remaining work to do

Closes #155

will commented 1 year ago

So sorry @jgaskins I missed this being opened completely until just now!

I think it'd be cool to get pipelining in for sure. I think your approach here of a different interface for it is probably the only sane way to do it. I could never come up with a safe way to auto-pipeline queries.

No exec I think is fine.

If we can't figure out the hang, we can maybe put a limit as to how many queries are in the pipe?

Mechanism to ensure all result sets are consumed, I don’t have a strong opinion on this.