thanos-store is too slow for query and don't know where the time went

Melody-zyy commented 10 months ago

Thanos, Prometheus and Golang version used: thanos-store version:v0.30.2

Object Storage Provider: txyun cos

What happened: i used thanos-store to query history data，but its too slow，i has collected many metrics, but I still don't know where the time went

What you expected to happen: i want to data return in 20 seconds or i want to know where the time went, when i query hisotry data

How to reproduce it (as minimally and precisely as possible):

Full logs to relevant components: This log is because the query reaches the thanos-query timeout, which interrupts subsequent thanos-store queries to the memcached cache

level=warn ts=2023-11-30T14:51:06.285175671Z caller=memcached_client.go:436 name=index-cache msg="failed to fetch items from memcached" numKeys=5 firstKey=P:01HFYJB6BXH9BD03Z6H29VXEXG:iWpYeQElI53MP3C4cUnj44BIr-dGk2AQ8I4-BjlebIg err="context canceled"

Anything else we need to know:

thanos-store has 10 shards based on the blocked id
thanos-compactor normal run in compact and downsample
i used memcached to cache
thanos-query --store.response-timeout=40s

thanos-store config： index-cache.yml： bucket-cache.yml：

yeya24 commented 10 months ago

There could be different reasons. Do you have tracing setup? Or if you enable debug logging in Thanos store, we can take a look at stats query processed and there are detailed query stats

Melody-zyy commented 10 months ago

There could be different reasons. Do you have tracing setup? Or if you enable debug logging in Thanos store, we can take a look at stats query processed and there are detailed query stats

how to set up tracing？
how to set up debug logging?
Need some indicator data for reference? I can provide！

Melody-zyy commented 10 months ago

There could be different reasons. Do you have tracing setup? Or if you enable debug logging in Thanos store, we can take a look at stats query processed and there are detailed query stats

I want to know whether the time is spent on data pulling or data processing, or if there is a problem with the memcached server? Are there any indicators to reflect that

yeya24 commented 10 months ago

https://thanos.io/tip/thanos/tracing.md/#tracing You can set up tracing following our doc. It provides some examples.

Debug logging can be enabled via --log.level=debug.

There are also metrics in Store Gateway showing latency and latency requesting memcached.

Melody-zyy commented 10 months ago

https://thanos.io/tip/thanos/tracing.md/#tracing You can set up tracing following our doc. It provides some examples.

Debug logging can be enabled via --log.level=debug.

There are also metrics in Store Gateway showing latency and latency requesting memcached.

ok thanks,i will try

Melody-zyy commented 10 months ago

https://thanos.io/tip/thanos/tracing.md/#tracing You can set up tracing following our doc. It provides some examples.

Debug logging can be enabled via --log.level=debug.

There are also metrics in Store Gateway showing latency and latency requesting memcached.

level=debug ts=2023-12-04T03:37:25.626095328Z caller=bucket.go:1293 msg="stats query processed" request="min_time:1700252244000 max_time:1700264301000 matchers:xxx max_resolution_window:3600000 aggregates:COUNTER " stats="&{blocksQueried:8 postingsTouched:14 PostingsTouchedSizeSum:179MiB365KiB230B postingsToFetch:0 postingsFetched:1 PostingsFetchedSizeSum:178MiB551KiB436B postingsFetchCount:1 PostingsFetchDurationSum:37.181431729s cachedPostingsCompressions:1 cachedPostingsCompressionErrors:0 CachedPostingsOriginalSizeSum:178MiB551KiB436B CachedPostingsCompressedSizeSum:2MiB964KiB33B CachedPostingsCompressionTimeSum:1.119731555s cachedPostingsDecompressions:13 cachedPostingsDecompressionErrors:0 CachedPostingsDecompressionTimeSum:13.56747ms seriesTouched:49793 SeriesTouchedSizeSum:6MiB279KiB8B seriesFetched:0 SeriesFetchedSizeSum:0B seriesFetchCount:0 SeriesFetchDurationSum:0s chunksTouched:4855 ChunksTouchedSizeSum:1MiB356KiB606B chunksFetched:4855 ChunksFetchedSizeSum:19MiB561KiB685B chunksFetchCount:5 ChunksFetchDurationSum:2.369927365s GetAllDuration:38.304761546s mergedSeriesCount:4855 mergedChunksCount:4855 MergeDuration:29.653014ms}" err=null

level=debug ts=2023-12-04T07:01:48.448042489Z caller=bucket.go:1293 msg="stats query processed" request="min_time:1699642800000 max_time:1699646400000 matchers:<xxx> matchers:<xxx > max_resolution_window:3600000 aggregates:COUNT aggregates:SUM " stats="&{blocksQueried:8 postingsTouched:24 PostingsTouchedSizeSum:381MiB116KiB432B postingsToFetch:0 postingsFetched:2 PostingsFetchedSizeSum:378MiB861KiB256B postingsFetchCount:2 PostingsFetchDurationSum:40.695843201s cachedPostingsCompressions:2 cachedPostingsCompressionErrors:0 CachedPostingsOriginalSizeSum:378MiB861KiB256B CachedPostingsCompressedSizeSum:5MiB747KiB422B CachedPostingsCompressionTimeSum:2m27.28031484s cachedPostingsDecompressions:22 cachedPostingsDecompressionErrors:0 CachedPostingsDecompressionTimeSum:461.760505ms seriesTouched:102 SeriesTouchedSizeSum:12KiB79B seriesFetched:0 SeriesFetchedSizeSum:0B seriesFetchCount:0 SeriesFetchDurationSum:0s chunksTouched:6 ChunksTouchedSizeSum:4KiB57B chunksFetched:6 ChunksFetchedSizeSum:36KiB317B chunksFetchCount:1 ChunksFetchDurationSum:48.101376ms GetAllDuration:1m34.625021851s mergedSeriesCount:0 mergedChunksCount:0 MergeDuration:0s}" err="rpc error: code = Unknown desc = failed to receive any data from 01HFZ77QH5HNNVBCW9R5WT8HDW: context canceled"

When the query is slow, I look at the debug log and find that a large part of the time is spent on getting postings.Is this because postings data is too large?

thanos-io / thanos

thanos-store is too slow for query and don't know where the time went #6945