Throughput ceiling at 200k-300k req/s

There seems to be a throughput ceiling caused by the fact that spawning asynchronous queries is essentially single-threaded. The following operations take surprisingly large amount of time:

spawning a new tokio task
binding a statement and submitting it asynchronously to the driver (the major cost)

With the current design, these operations are serial and don't scale on multicore.

Proposed solution:

refactor the main loop to use asynchronous streams (using the Stream abstraction)
don't spawn each query as a separate task, make each stream concurrent, but single-threaded (should decrease scheduling costs; async/await are cheaper than spawn)
create many independent query streams and spawn them on a separate threads; merge them using mpsc channel

To be decided later: should we share a single Session or have each stream its own Session?

pkolaczk / latte

Throughput ceiling at 200k-300k req/s #1