jaegertracing / jaeger-clickhouse

Jaeger ClickHouse storage plugin implementation
Apache License 2.0
247 stars 51 forks source link

Decide on default encoding #17

Closed pavolloffay closed 3 years ago

pavolloffay commented 3 years ago

Right now the default encoding is JSON. It's JSON bc it was historically set to JSON. I would like to understand why JSON is preferred over protobuf.

https://github.com/pavolloffay/jaeger-clickhouse/blob/main/config.yaml#L8

EinKrebs commented 3 years ago

I ran some test consisting of pushing 100k trace using tracegen with json and protobuf encoding. After each test I performed following request:

SELECT 
    table, 
    sum(marks) AS marks, 
    sum(rows) AS rows, 
    sum(bytes_on_disk) AS bytes_on_disk, 
    sum(data_compressed_bytes) AS data_compressed_bytes, 
    sum(data_uncompressed_bytes) AS data_uncompressed_bytes, 
    toDecimal64(data_uncompressed_bytes / data_compressed_bytes, 2) AS compression_ratio, 
    toDecimal64(data_compressed_bytes / rows, 2) AS compressed_bytes_per_row
FROM system.parts 
WHERE table LIKE 'jaeger_%'
GROUP BY table
ORDER BY table ASC

Here are results: `` protobuf_1:

┌─table───────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐ │ jaeger_index_v2 │ 503 │ 490000 │ 4091055 │ 3967057 │ 118629252 │ 29.90 │ 8.09 │ │ jaeger_spans_v2 │ 454 │ 440000 │ 12522443 │ 12484567 │ 151886845 │ 12.16 │ 28.37 │ └─────────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘

protobuf_2:

table───────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐ │ jaeger_index_v2 │ 510 │ 496514 │ 4158369 │ 4032902 │ 120204132 │ 29.80 │ 8.12 │ │ jaeger_spans_v2 │ 461 │ 446514 │ 12674441 │ 12636040 │ 154177577 │ 12.20 │ 28.29 │ └─────────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘

protobuf_3:

┌─table───────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐ │ jaeger_index_v2 │ 513 │ 498950 │ 4170922 │ 4045016 │ 120791586 │ 29.86 │ 8.10 │ │ jaeger_spans_v2 │ 464 │ 448950 │ 12726910 │ 12688290 │ 154971099 │ 12.21 │ 28.26 │ └─────────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘

json_1:

┌─table───────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐ │ jaeger_index_v2 │ 312 │ 300000 │ 2533411 │ 2459353 │ 72624034 │ 29.52 │ 8.19 │ │ jaeger_spans_v2 │ 263 │ 250000 │ 8569218 │ 8547874 │ 175600786 │ 20.54 │ 34.19 │ └─────────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘

json_2:

┌─table───────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐ │ jaeger_index_v2 │ 321 │ 308351 │ 2610419 │ 2534658 │ 74342312 │ 29.33 │ 8.22 │ │ jaeger_spans_v2 │ 272 │ 258351 │ 8858087 │ 8836072 │ 181212076 │ 20.50 │ 34.20 │ └─────────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘

json_3:

┌─table───────────┬─marks─┬───rows─┬─bytes_on_disk─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─compression_ratio─┬─compressed_bytes_per_row─┐ │ jaeger_index_v2 │ 319 │ 305652 │ 2592937 │ 2517449 │ 74005512 │ 29.39 │ 8.23 │ │ jaeger_spans_v2 │ 270 │ 255652 │ 8767835 │ 8745966 │ 179576608 │ 20.53 │ 34.21 │ └─────────────────┴───────┴────────┴───────────────┴───────────────────────┴─────────────────────────┴───────────────────┴──────────────────────────┘ ``

So JSON seems to use less disk space. I'm gonna try more spans, then report.

EinKrebs commented 3 years ago

On 1M traces JSON still uses less disk space.

pavolloffay commented 3 years ago

thanks, then let's keep using json as default.