parca-dev / parca

Continuous profiling for analysis of CPU and memory usage, down to the line number and throughout time. Saving infrastructure cost, improving performance, and increasing reliability.
https://parca.dev/
Apache License 2.0
3.95k stars 211 forks source link

building continuous profiling platform base on parca. #1994

Open zdyj3170101136 opened 1 year ago

zdyj3170101136 commented 1 year ago

i am an engineer. Recently we build a continous profiiling platform base on parca. I am excited to share it with you!

use clickhouse store stacktrace

parca's stacktrace is store in memory, frostdb.

problem

we found it have some problem:

high write Amplification

with 80 kb/s network in traffic, the machine memory is increase with 0.2 GB /s. image image

slow profileRange, profileType

this two api would cost 10~ seconds. profileType need to query all data from frostdb. profileRange need to query all samples's value to get totalvalue of this profile. image image

solution

so we try use clickhouse to store profile and stacktrace. we have two table, index table and stacktrace table.

profilingID totalValue labels.key labels.value
D6HugNjT6dV 20000000 ['instance','job'] ['xxxxxx:12580','xxxxxx']
profilingID value stacktrace timestamp
D6HugNjT6dV 524328 stacktrace 2022-09-28 10:20:03

cause one profile file could have thousands of stacktrace, so stacktrace table is much larger than index table.

query is very fast

now, the api profileRange, profileType only need to query index table. it take less one 100ms to get profileRange from thousands of series of profile. 截屏2022-10-27 上午11 17 01

it take less one 100ms to query single profile from clickhouse. 截屏2022-10-27 上午11 15 10

high compression ratio

the store compression ratio is more than 7. it compress 4 TB of raw data to about 500 GB.

┌─table──────────────────┬────marks─┬────────rows─┬─compressed─┬─uncompressed─┬─compression_ratio─┬─bytes_per_row─┬─pk_in_memory─┐
│ parca_index_local      │    48076 │    49040727 │ 4.43 GiB   │ 6.60 GiB     │              1.48 │ 96.99 B       │ 359.28 KiB   │
│ parca_stacktrace_local │ 16047052 │ 16432042504 │ 531.15 GiB │ 3.91 TiB     │              7.54 │ 34.71 B       │ 1.45 GiB     │
└────────────────────────┴──────────┴─────────────┴────────────┴──────────────┴───────────────────┴───────────────┴──────────────┘

visualization source and numLabel

we use github.com/google/pprof to visualization profile. it enable use to view source and numLabel. 截屏2022-10-27 上午11 22 06

use clickhouse to store metadata

the metadata was stored in badgerDB, which not enable us to deploy multi parca server.

and sometimes it cause more than 700% iowait cpu to compaction.(use 100GB SSD, used 30GB) 截屏2022-10-27 上午11 27 15

so we use clickhouse to store metadata(parca should only rely on one kind of store). and use a local cache to distinct key.

high compression ratio

compression ratio is more than 7.

┌─table────────────────┬─marks─┬─────rows─┬─compressed─┬─uncompressed─┬─compression_ratio─┬─bytes_per_row─┬─pk_in_memory─┐
│ parca_metadata_local │ 12100 │ 12371734 │ 1.72 GiB   │ 11.82 GiB    │              6.87 │ 149.36 B      │ 1.67 MiB     │
└──────────────────────┴───────┴──────────┴────────────┴──────────────┴───────────────────┴───────────────┴──────────────┘

great search speed

with about 5000 stacktrace, it take less than 1s to resolve stacktrace.

manual scrape

we add a manual scrape html. user input ip, port, endpoint and then click 'allocs'. and it will automatically redirect to parca.

MUlan2004 commented 1 year ago

Do you have a PR/fork available to incorporate CH as storage backend ?