will / crystal-pg

a postgres driver for crystal
BSD 3-Clause "New" or "Revised" License
462 stars 77 forks source link

Reuse IO::Sized instances in ResultSet #228

Closed jgaskins closed 3 years ago

jgaskins commented 3 years ago

TL;DR: Reduced CPU consumption by about 73%, but still needs some polish

I've been benchmarking Crystal vs Rust/Go for some reasonably realistic workloads. A while back I compared communicating with Redis (because it's simple) between Rust and Crystal — Crystal beat Rust by a factor of 2-3x.

This week I ran a similar set of benchmarks for Postgres, using Diesel for Rust and Interro for Crystal, with varying horse-sized-duck and duck-sized-horse queries, and Rust beat Crystal by almost the same amount, so I started looking and found that PG::ResultSet#read calls safe_read, which instantiates an IO::Sized. This appears to be called for every column of every row, so if a table is super wide (whether it just accrued cruft) or the result set has a crap-ton of rows, it's going to create a lot of these IO::Sized instances that only live long enough to decode a single value from a result set. Because IO::Sized is allocated on the heap, this results in a lot of garbage.

This PR creates a single instance of IO::Sized to be reused and moves the goal post every time we need to read a value. Here is the result:

Click here for benchmark code (not the same benchmark as used for Interro and Diesel, mentioned above) ```crystal require "benchmark" require "db" require "../src/pg" struct Model include DB::Serializable getter float : Float64 getter int : Int32 getter string : String getter uuid : UUID getter nullable : String? end db = DB.open("postgres:///") query = <<-SQL SELECT random() AS float, (random() * 1000000000)::int AS int, random()::text AS string, uuid_generate_v4() AS uuid, NULL AS nullable FROM generate_series(1, 100000) SQL args = [] of String model = nil bm = Benchmark.measure do db.query_each query do |rs| model = rs.read(Model) end end pp model # Ensure LLVM doesn't optimize out the `rs.read` calls pp wall_clock: bm.real, cpu: bm.total ```
Wall clock CPU
Before 0.193830283 0.135133
After 0.135133 0.036025

This uses 73% less CPU time than the currently released version, which I found kinda mind-boggling. In my benchmark of Interro (Crystal) vs Diesel (Rust), with this patch Interro is now using as little CPU time as Diesel, at 7.04 vs 7.12, respectively:

bin/interro_bench            4.38s user 2.66s system 31% cpu 22.073 total
target/release/diesel_bench  6.17s user 0.95s system 42% cpu 16.630 total

I don't know why the wall-clock times are so different, but it may have to do with similarly wild difference in system CPU time. On my M1 Mac, the system CPU times are nearly identical with this patch so I'm not entirely sure what's up there:

bin/interro_bench            3.03s user 0.47s system 44% cpu 7.940 total
target/release/diesel_bench  4.05s user 0.43s system 57% cpu 7.753 total

NOTE: This is a draft PR purely because I'm not thrilled with the idea of extending a stdlib class to make it externally mutable. It feels brittle because it exposes an implementation details that was previously encapsulated. I only used it because it took absolutely minimal effort. 🙂 I figure a final version of this could either be a simplified implementation of IO::Sized (the existing one isn't huge and includes methods we probably don't need here), but I wanted to get your input on that first.

straight-shoota commented 3 years ago

I'd suggest to make that property writable in the stdlib implementation. This is a perfectly valid use case and I would imagine this isn't that uncommon.

will commented 3 years ago

Great find, thank you for the investigation and PR! If you put together a PR for the crystal stdlib as @straight-shoota suggests please link it here.

Given how small the change will be, I wouldn't be opposed to back-porting it here in a monkey patch with a conditional on crystal version. Especially since it'd just be a temporary measure. But others may disagree since it is a little gross I suppose.

jgaskins commented 3 years ago

Well, then, brb making a PR to Crystal 😄

jgaskins commented 3 years ago

@will I've updated it to be a monkeypatch on IO::Sized and included a comment about why it's there, including a link to the Crystal PR. We have a backlink to the Crystal PR just above this comment, as well. I'll go ahead and mark this one as ready for review.

will commented 3 years ago

@jgaskins great thank you! Could you help me out a little bit for the future do some git rebasing to get the stdlib monkey patch in one commit so it can be easily reverted, then the rest in a second patch?

jgaskins commented 3 years ago

Sweet, thanks Will!