thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.02k stars 2.09k forks source link

Query: grpc compression not working #5827

Open yutian1224 opened 1 year ago

yutian1224 commented 1 year ago

Version:

Thanos: v0.29.0-rc.0 Image: thanosio/thanos:v0.29.0-rc.0

What happened:

I did a grpc compression test with the latest version of the image, trying to reduce the transmission traffic.

Data flow diagram: store(grpc) -> query1(grpc) -> query2(http) -> queryFrontend

query1 traffic map: yellow: with --grpc-compression=snappy green: with --grpc-compression=none image

query2 traffic map: yellow: with --grpc-compression=snappy green: with --grpc-compression=none image

**What you expected to happen:

The bandwidth in query1 & query2 should be reduced after turning on grpc compression.

**Details in args

store: receive --log.level=info --log.format=logfmt --grpc-address=0.0.0.0:10901 --http-address=0.0.0.0:10902 --remote-write.address=0.0.0.0:19291 --objstore.config=$(OBJSTORE_CONFIG) --tsdb.path=/var/thanos/receive --label=thanosreplica="$(NAME)" --label=receive="true" --tsdb.retention=1d --receive.local-endpoint="$(ENDPOINT)"

query1: query --log.level=info --log.format=logfmt --grpc-address=0.0.0.0:10901 --http-address=0.0.0.0:10902 --endpoint="$(ENDPOINT)" --query.auto-downsampling --query.default-step=30s --query.metadata.default-time-range=5m --query.max-concurrent=100 --query.max-concurrent-select=20 --grpc-compression=snappy

query2: query --log.level=info --log.format=logfmt --grpc-address=0.0.0.0:10901 --http-address=0.0.0.0:10902 --endpoint="$(QUERY1)" --query.auto-downsampling --query.default-step=30s --query.metadata.default-time-range=5m --query.max-concurrent=100 --query.max-concurrent-select=20 --grpc-compression=snappy

frontend: `query-frontend --log.level=info --log.format=logfmt --web.disable-cors --http-address=0.0.0.0:10902 --query-frontend.compress-responses --query-frontend.downstream-url="$(QUERY2)"

GiedriusS commented 1 year ago

What about the average traffic usage difference? It's hard to tell what's happening just from usage graphs. I'm 99% sure that compression works because with --grpc-compression=snappy my project https://github.com/GiedriusS/thanos-rust doesn't work (https://github.com/hyperium/tonic/issues/282):

Error executing query: proxy Series(): rpc error: code = Aborted desc = receive series from Addr: 127.0.0.1:50051 LabelSets: {dc="hx", prometheus_node_id="5"} Mint: -9223372036854775808 Maxt: 9223372036854775807: rpc error: code = Unimplemented desc = Message compressed, compression support not enabled.

Perhaps your traffic consists of a lot of unique labels or you have lots of different, small queries hence there's no obvious effect.

yutian1224 commented 1 year ago

Perhaps your traffic consists of a lot of unique labels or you have lots of different, small queries hence there's no obvious effect.

This may be the key, we have a lot of alert and record rules, and queries with large time spans are also sharded by frontend