Inefficient use of Flush in extended query protocol

typelevel / skunk

A data access library for Scala + Postgres.

MIT License

1.56k stars 157 forks source link

When executing a prepared statement, Skunk currently uses the following protocol flow:

Parse
Flush
Await response and check for errors
Describe statement
Flush
Await response and check for errors
Bind
Flush
Await response and check for errors
Execute
Flush
Await response and check for errors
Close portal
Flush
Await response and check for errors
Close statement
Flush
Await response and check for errors

This is needlessly inefficient, as it prevents batching of request commands and requires a full network round-trip on every step.

In addition, this currently fails to work—and instead hangs indefinitely—with a Pgpool-II server, which does not fully honour Flush. (This is currently being discussed on the pgpool-general mailing list.)

As far as I can tell, the only reason it is done this way is that MessageSocket does not allow queueing up response processing asynchronously and submitting more commands in the meantime. For example, skunk.net.protocol.Parse currently does this:

https://github.com/tpolecat/skunk/blob/9e0744d00f6876f1eda5016ea34d7f0ecac40736/modules/core/shared/src/main/scala/net/protocol/Parse.scala#L38-L55

where flatExpect suspends the fibre handling the current database connection until a response message is received. In an ideal world, it would schedule the block to be run whenever the next message arrives, but continue executing the sending fibre so more protocol commands can be queued; so the Flush could be omitted.

Any idea as to how this could be allowed in a clean way? I’m thinking that the sending and receiving should be separate fibres, but I’m not well-versed enough to see immediately how to achieve that.

Hi, thanks for the issue.

The big design goal for Skunk is transparent semantics: when things happen in code they happen on the server, so errors happen in the expected place, on the expected fiber (sessions can be used concurrently and we don't want an error on one fiber to be detected on another). So exchanges with the database must be self-contained (i.e., they need to end with ReadyForQuery, which sometimes requires a Flush).

Having said that, we're doing more exchanges than we need to do, and aren't caching everything that can be cached:

Parse and Describe are a single operation at the API level and can thus be pipelined into a single exchange. In addition, the results can normally be cached and the resulting statement can be reused. Right now we're only caching the Describe exchange.
Bind and the initial related Execute are also a single operation at the API level and can thus be pipelined.
Portals need to be closed eagerly, but not as eagerly as it's done now. We can save them up and close them asynchronously after the session is returned to the pool.

Once this is implemented, the current 6+ exchanges would be reduced to 2+ from on the first use:

Parse+Describe
- Flush (may not be necessary)
- Await response and check for errors
Bind+Execute
- Flush (may not be necessary)
- Await response and check for errors
Execute*
- Flush (may not be necessary)
- Await response and check for errors

On subsequent use the initial exchange would be unnecessary, so interactions that fetch a single page of results would require only one:

Bind+Execute
- Flush (may not be necessary)
- Await response and check for errors
Execute*
- Flush (may not be necessary)
- Await response and check for errors

It probably makes sense to pre-fetch the next page of results when processing the result stream, so you can be reading the next page while you're streaming the current page back to the client or whatever.

In any case there is a plan. Right now I'm working on replacing the session queue, which will make caching easier.

typelevel / skunk

Inefficient use of Flush in extended query protocol #605