Audit and possibly update compression codecs for ClickHouse data

I was looking at dogfood yesterday with @andrewjstone for some measurements, it's pretty interesting:

oxz_clickhouse_aa646c82-c6d7-4d0c-8401-150130927759.local :) select name, formatReadableQuantity(total_rows), formatReadableSize(total_bytes) from system.tables where database = 'oximeter';

SELECT
    name,
    formatReadableQuantity(total_rows),
    formatReadableSize(total_bytes)
FROM system.tables
WHERE database = 'oximeter'

Query id: 2e90da34-3277-448e-873b-c57d64b37b44

┌─name───────────────────────┬─formatReadableQuantity(total_rows)─┬─formatReadableSize(total_bytes)─┐
│ fields_bool                │ 0.00                               │ 0.00 B                          │
│ fields_i16                 │ 0.00                               │ 0.00 B                          │
│ fields_i32                 │ 0.00                               │ 0.00 B                          │
│ fields_i64                 │ 149.41 thousand                    │ 1.18 MiB                        │
│ fields_i8                  │ 0.00                               │ 0.00 B                          │
│ fields_ipaddr              │ 64.10 thousand                     │ 624.01 KiB                      │
│ fields_string              │ 570.18 thousand                    │ 6.58 MiB                        │
│ fields_u16                 │ 63.87 thousand                     │ 560.87 KiB                      │
│ fields_u32                 │ 61.98 thousand                     │ 503.10 KiB                      │
│ fields_u64                 │ 233.00                             │ 4.88 KiB                        │
│ fields_u8                  │ 0.00                               │ 0.00 B                          │
│ fields_uuid                │ 813.00 thousand                    │ 10.31 MiB                       │
│ measurements_bool          │ 14.97 million                      │ 115.94 MiB                      │
│ measurements_bytes         │ 0.00                               │ 0.00 B                          │
│ measurements_cumulativef32 │ 0.00                               │ 0.00 B                          │
│ measurements_cumulativef64 │ 0.00                               │ 0.00 B                          │
│ measurements_cumulativei64 │ 247.16 million                     │ 1.96 GiB                        │
│ measurements_cumulativeu64 │ 3.15 billion                       │ 24.27 GiB                       │
│ measurements_f32           │ 0.00                               │ 0.00 B                          │
│ measurements_f64           │ 171.00                             │ 3.38 KiB                        │
│ measurements_histogramf32  │ 0.00                               │ 0.00 B                          │
│ measurements_histogramf64  │ 202.30 million                     │ 3.77 GiB                        │
│ measurements_histogrami16  │ 0.00                               │ 0.00 B                          │
│ measurements_histogrami32  │ 0.00                               │ 0.00 B                          │
│ measurements_histogrami64  │ 0.00                               │ 0.00 B                          │
│ measurements_histogrami8   │ 0.00                               │ 0.00 B                          │
│ measurements_histogramu16  │ 0.00                               │ 0.00 B                          │
│ measurements_histogramu32  │ 0.00                               │ 0.00 B                          │
│ measurements_histogramu64  │ 0.00                               │ 0.00 B                          │
│ measurements_histogramu8   │ 0.00                               │ 0.00 B                          │
│ measurements_i16           │ 0.00                               │ 0.00 B                          │
│ measurements_i32           │ 0.00                               │ 0.00 B                          │
│ measurements_i64           │ 2.56 million                       │ 20.06 MiB                       │
│ measurements_i8            │ 0.00                               │ 0.00 B                          │
│ measurements_string        │ 0.00                               │ 0.00 B                          │
│ measurements_u16           │ 0.00                               │ 0.00 B                          │
│ measurements_u32           │ 0.00                               │ 0.00 B                          │
│ measurements_u64           │ 535.92 million                     │ 3.08 GiB                        │
│ measurements_u8            │ 0.00                               │ 0.00 B                          │
│ timeseries_schema          │ 136.00                             │ 4.97 KiB                        │
│ version                    │ 4.00                               │ 383.00 B                        │
└────────────────────────────┴────────────────────────────────────┴─────────────────────────────────┘

41 rows in set. Elapsed: 0.002 sec.

That's the total number of rows and bytes on disk for every table. And here it is for the columns, with more details comparing compressed and uncompressed sizes:

SELECT
    table,
    name,
    formatReadableSize(data_compressed_bytes) AS compressed,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed,
    data_uncompressed_bytes / data_compressed_bytes AS compression_ratio
FROM system.columns
WHERE (database = 'oximeter') AND (data_uncompressed_bytes > 0)
ORDER BY data_uncompressed_bytes DESC

Query id: 76c915a1-30cc-422b-b23d-2e52745e16b5

┌─table──────────────────────┬─name─────────────────────────┬─compressed─┬─uncompressed─┬──compression_ratio─┐
│ measurements_histogramf64  │ counts                       │ 909.15 MiB │ 130.38 GiB   │ 146.85047450022356 │
│ measurements_histogramf64  │ bins                         │ 598.05 MiB │ 130.38 GiB   │ 223.24001762710947 │
│ measurements_cumulativeu64 │ timeseries_name              │ 389.57 MiB │ 90.96 GiB    │ 239.08673083513202 │
│ measurements_cumulativeu64 │ datum                        │ 5.29 GiB   │ 26.66 GiB    │  5.040083656944843 │
│ measurements_cumulativeu64 │ timestamp                    │ 18.67 GiB  │ 23.70 GiB    │ 1.2690405602296886 │
│ measurements_cumulativeu64 │ timeseries_key               │ 111.29 MiB │ 23.70 GiB    │  218.0501897202519 │
│ measurements_cumulativeu64 │ start_time                   │ 111.46 MiB │ 23.70 GiB    │ 217.70598102683994 │
│ measurements_u64           │ timeseries_name              │ 81.97 MiB  │ 19.27 GiB    │  240.6641221334969 │
│ measurements_histogramf64  │ p99_marker_positions         │ 80.41 MiB  │ 9.31 GiB     │ 118.59737605700495 │
│ measurements_histogramf64  │ p99_marker_heights           │ 62.70 MiB  │ 9.31 GiB     │  152.0864321593366 │
│ measurements_histogramf64  │ p90_marker_positions         │ 80.01 MiB  │ 9.31 GiB     │ 119.19142565966818 │
│ measurements_histogramf64  │ p90_marker_heights           │ 80.89 MiB  │ 9.31 GiB     │ 117.89083480980896 │
│ measurements_histogramf64  │ p50_desired_marker_positions │ 84.53 MiB  │ 9.31 GiB     │ 112.81113723973245 │
│ measurements_histogramf64  │ p50_marker_positions         │ 79.63 MiB  │ 9.31 GiB     │ 119.76081854764136 │
│ measurements_histogramf64  │ p99_desired_marker_positions │ 97.85 MiB  │ 9.31 GiB     │  97.45842662635648 │
│ measurements_histogramf64  │ p50_marker_heights           │ 95.76 MiB  │ 9.31 GiB     │   99.5851363389799 │
│ measurements_histogramf64  │ p90_desired_marker_positions │ 92.90 MiB  │ 9.31 GiB     │ 102.65700559328867 │
│ measurements_histogramf64  │ timeseries_name              │ 33.18 MiB  │ 7.57 GiB     │  233.5322928053325 │
│ measurements_cumulativei64 │ timeseries_name              │ 26.99 MiB  │ 6.25 GiB     │ 237.21866779228318 │
│ measurements_u64           │ datum                        │ 21.92 MiB  │ 4.50 GiB     │ 210.34105049480357 │
│ measurements_u64           │ timestamp                    │ 2.97 GiB   │ 4.00 GiB     │ 1.3485513022482545 │
│ measurements_u64           │ timeseries_key               │ 18.70 MiB  │ 4.00 GiB     │  219.0994059146269 │
│ measurements_cumulativei64 │ datum                        │ 92.89 MiB  │ 2.23 GiB     │ 24.604943104144724 │
│ measurements_cumulativei64 │ start_time                   │ 10.09 MiB  │ 1.98 GiB     │ 201.25950923166573 │
│ measurements_cumulativei64 │ timestamp                    │ 1.98 GiB   │ 1.98 GiB     │ 1.0035171013115416 │
│ measurements_cumulativei64 │ timeseries_key               │ 10.01 MiB  │ 1.98 GiB     │   202.955072204467 │
│ measurements_histogramf64  │ start_time                   │ 7.18 MiB   │ 1.55 GiB     │ 221.28830553153372 │
│ measurements_histogramf64  │ timestamp                    │ 1.56 GiB   │ 1.55 GiB     │  0.995935851467409 │
│ measurements_histogramf64  │ min                          │ 7.33 MiB   │ 1.55 GiB     │ 216.92139035421621 │
│ measurements_histogramf64  │ max                          │ 7.34 MiB   │ 1.55 GiB     │ 216.44710887220677 │
│ measurements_histogramf64  │ sum_of_samples               │ 27.32 MiB  │ 1.55 GiB     │  58.17056488098587 │
│ measurements_histogramf64  │ squared_mean                 │ 27.11 MiB  │ 1.55 GiB     │ 58.626569812110986 │
│ measurements_histogramf64  │ timeseries_key               │ 7.24 MiB   │ 1.55 GiB     │  219.4561308699875 │
│ measurements_bool          │ timeseries_name              │ 1.30 MiB   │ 305.63 MiB   │ 234.20297815493538 │
│ measurements_bool          │ timestamp                    │ 126.81 MiB │ 127.01 MiB   │ 1.0015793946449754 │
│ measurements_bool          │ timeseries_key               │ 599.61 KiB │ 127.01 MiB   │ 216.91126442173015 │
│ measurements_i64           │ timeseries_name              │ 266.65 KiB │ 61.43 MiB    │ 235.88718673664087 │
│ measurements_bool          │ datum                        │ 145.32 KiB │ 31.75 MiB    │  223.7565673196465 │
│ measurements_i64           │ datum                        │ 352.31 KiB │ 24.48 MiB    │  71.14463552147953 │
│ measurements_i64           │ timeseries_key               │ 103.29 KiB │ 21.76 MiB    │ 215.70501196025225 │
│ measurements_i64           │ timestamp                    │ 21.60 MiB  │ 21.76 MiB    │ 1.0071676325075236 │
│ fields_uuid                │ timeseries_name              │ 89.17 KiB  │ 19.84 MiB    │ 227.85120084108158 │
│ fields_string              │ timeseries_name              │ 77.24 KiB  │ 17.63 MiB    │  233.7552529773193 │
│ fields_uuid                │ field_value                  │ 4.00 MiB   │ 10.53 MiB    │ 2.6312247192377023 │
│ fields_string              │ field_value                  │ 1.93 MiB   │ 9.30 MiB     │ 4.8180278387756985 │
│ fields_uuid                │ field_name                   │ 30.25 KiB  │ 6.54 MiB     │   221.430038096468 │
│ fields_uuid                │ timeseries_key               │ 5.17 MiB   │ 5.26 MiB     │ 1.0186928720229966 │
│ fields_string              │ timeseries_key               │ 3.85 MiB   │ 3.84 MiB     │ 0.9967238959519455 │
│ fields_string              │ field_name                   │ 13.59 KiB  │ 3.01 MiB     │ 226.76424107463544 │
└────────────────────────────┴──────────────────────────────┴────────────┴──────────────┴────────────────────┘

49 rows in set. Elapsed: 0.015 sec.

There's a few notable things. First, the compression ratios are already insane. That's great, but maybe not super surprising, because the data distribution works very well with sorting keys we use. There are lots of stretches of duplicated data, so the default compression (ZSTD, last I checked) works really well.

Second, we can see where the compression ratios don't work as well: timestamps and timeseries keys are the worst performing. I'm not sure about timeseries keys, but we should look into the delta or double-delta encodings for the timestamps, which replaces them with their differences. Those values should have much lower precision and so compress much better. But we should test it out.

The other thing that I would like to consider is dictionary encodings for the strings. Timeseries names are a good example -- we expect to have a relatively small number of these even in the limit, probably on the order of a thousand. The LowCardinality type wrapper should help here, although we are already compressing these quite well.

oxidecomputer / omicron

Audit and possibly update compression codecs for ClickHouse data #4060