[performance] label: takes 25 seconds whene cardinality >= 10w

liguozhong commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

/loki/api/v1/label?start=1653399718498000000&end=1653403318498000000

The slow label http handler of loki leads to a poor experience in grafana loki explore, it takes 25 seconds to load the label prompt box on the left.

level=debug ts=2022-05-25T03:16:47.274713123Z caller=series_index_store.go:95 org_id=1662_qamopdln traceID=46b14075ea76f47d series-ids=64129

level=debug ts=2022-05-25T03:16:47.275711308Z caller=series_index_store.go:390 org_id=1662_qamopdln traceID=46b14075ea76f47d msg="post intersection" matchers=1 ids=64191

code : pkg/querier/querier.go:352 func (q SingleTenantQuerier) Label(ctx context.Context, req logproto.LabelRequest) (*logproto.LabelResponse, error) { }

pkg/storage/stores/series/series_index_store.go:220 func (c *indexStore) LabelNamesForMetricName(ctx context.Context, userID string, from, through model.Time, metricName string) ([]string, error) { }

pkg/storage/stores/series/series_index_store.go:502 func (c *indexStore) lookupEntriesByQueries(ctx context.Context, queries []index.Query) ([]index.Entry, error) { err := c.index.QueryPages(ctx, queries, func(query index.Query, resp index.ReadBatchResult) bool {

} }

To Reproduce Steps to reproduce the behavior:

Started Loki (SHA or version)
Started Promtail (SHA or version) to tail '...'
Query: {} term

Expected behavior A clear and concise description of what you expected to happen.

Environment:

Infrastructure: [e.g., Kubernetes, bare-metal, laptop]
Deployment tool: [e.g., helm, jsonnet]

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.

slow

honganan commented 2 years ago

Our scenario do not have so much labels, but label scan is also the performance bottleneck. When query in big tenant, one query shard need to scan over 20k chunkIDs from Cassandra and it take seconds time. The CPU max usage metric is reaching 100% instantaneous.

I am thinking of if we can split by time shards then compress and store ids for every stream.

stale[bot] commented 2 years ago

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

grafana / loki

[performance] label: takes 25 seconds whene cardinality >= 10w #6243