oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
252 stars 40 forks source link

Select samples from ClickHouse using the native client #7093

Closed bnaecker closed 6 days ago

bnaecker commented 1 week ago
bnaecker commented 6 days ago

I did a quick benchmark to compare the performance of queries using the native protocol against those of the current JSON-over-HTTP protocol on main. I have an old backup of Dogfood data on my local machine, and ran the following OxQL query:

get physical_data_link:bytes_sent |
    filter serial == 'BRM42220006' && timestamp > @<start_time> && timestamp < @<end_time>

I picked a random start time, and then used a range of end times after that, spaced from 1 minute to 10K minutes of data, in powers of 10. I ran the query 100 times, and computed the mean and standard deviation of the total query duration for each protocol. Here are the results:

JSON-over-HTTP

bnaecker@flint : ~/file-cabinet/oxide/omicron/oximeter/db $ CLICKHOUSE_ADDR=192.168.1.82 cargo r --bin oxql-bench
    Blocking waiting for file lock on build directory
   Compiling oximeter-db v0.1.0 (/Users/bnaecker/file-cabinet/oxide/omicron/oximeter/db)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 10.32s
     Running `/Users/bnaecker/file-cabinet/oxide/omicron/target/debug/oxql-bench`
       1 min: 0.05965 +/- 0.00847
      10 min: 0.06001 +/- 0.00828
     100 min: 0.06947 +/- 0.00619
    1000 min: 0.15447 +/- 0.00570
   10000 min: 1.04616 +/- 0.04443

Native (this branch)

bnaecker@flint : ~/file-cabinet/oxide/omicron/oximeter/db $ CLICKHOUSE_ADDR=192.168.1.82 cargo r --bin oxql-bench
   Compiling oximeter-db v0.1.0 (/Users/bnaecker/file-cabinet/oxide/omicron/oximeter/db)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 5.40s
     Running `/Users/bnaecker/file-cabinet/oxide/omicron/target/debug/oxql-bench`
       1 min: 0.05622 +/- 0.00541
      10 min: 0.05522 +/- 0.00459
     100 min: 0.05837 +/- 0.00486
    1000 min: 0.06580 +/- 0.00600
   10000 min: 0.18284 +/- 0.00712

As expected, for small queries there isn't much of a difference. Serialization is only one part of the total work, so it's kind of in the noise when there aren't that many data points. But the asymptotic behavior is pretty nice -- about a factor of 2 faster at 1K minutes of data, and nearly an order of magnitude at 10K minutes.