Open liguozhong opened 2 years ago
@liguozhong can you please provide a pprof
profile?
tracker.samplesPending.Store(int32(len(streams)))
for ingester, samples := range samplesByIngester {
go func(ingester ring.InstanceDesc, samples []*streamTracker) {
// Use a background context to make sure all ingesters get samples even if we return early
localCtx, cancel := context.WithTimeout(context.Background(), d.clientCfg.RemoteTimeout)
defer cancel()
localCtx = user.InjectOrgID(localCtx, userID)
if sp := opentracing.SpanFromContext(ctx); sp != nil {
localCtx = opentracing.ContextWithSpan(localCtx, sp)
}
d.sendSamples(localCtx, ingester, samples, &tracker)
}(ingesterDescs[ingester], samples)
}
12000+ goroutines are pkg/distributor/distributor.go:341
@liguozhong How many ingesters do you run in this clusters? Is the number of ingesters stable in the memberlist? AFAIU the number of goroutines should stay stable to the amount of healthy ingesters?
How many ingesters do you run in this clusters
How many ingesters do you run in this clusters?
72 ingesters.
The distributor goroutine rises only when the ingester is unavailable. It is expected that the distributor should not have 10000 goroutines regardless of the state of the ingester
WOW! 72 ingesters. That's huge. Can you graph the availability/unavailability of ingesters? e.g. joining/leaving the ring? I bet the map has more items than 72 because leaving/joining the ring is time sensitive.
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] There are too many distributor goroutines, can we add a max-goroutines limit configuration to avoid this happening? Our cluster has reached 12,000 goroutines
Describe the solution you'd like A clear and concise description of what you want to happen.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.