Open ovelicka opened 2 months ago
Hey there, @ovelicka! I'm here to help you with your Thanos issue. I'm a bot 🤖 that can help you solve bugs, answer questions, and become a contributor.
I found some similar issues and discussions that might help you resolve the problem with duplicate stores being detected in your Thanos Query setup:
To resolve the issue of duplicate stores being detected, ensure that the store addresses provided in your configuration are unique. The removeDuplicateEndpointSpecs
function in Thanos handles this by logging a warning and incrementing the thanos_query_duplicated_store_addresses_total
metric whenever a duplicate store address is detected.
Here is the relevant code snippet that handles the removal of duplicate store addresses:
func removeDuplicateEndpointSpecs(logger log.Logger, duplicatedStores prometheus.Counter, specs []*query.GRPCEndpointSpec) []*query.GRPCEndpointSpec {
set := make(map[string]*query.GRPCEndpointSpec)
for _, spec := range specs {
addr := spec.Addr()
if _, ok := set[addr]; ok {
level.Warn(logger).Log("msg", "Duplicate store address is provided", "addr", addr)
duplicatedStores.Inc()
}
set[addr] = spec
}
deduplicated := make([]*query.GRPCEndpointSpec, 0, len(set))
for _, value := range set {
deduplicated = append(deduplicated, value)
}
return deduplicated
}
Ensure that your configuration does not include duplicate store addresses. If you are using DNS-based discovery, make sure that the DNS records do not resolve to the same store addresses multiple times. The prepareEndpointSet
function uses removeDuplicateEndpointSpecs
to filter out duplicates:
func prepareEndpointSet(
g *run.Group,
logger log.Logger,
reg *prometheus.Registry,
dnsProviders []*dns.Provider,
duplicatedStores prometheus.Counter,
strictStores []string,
strictEndpoints []string,
endpointGroupAddrs []string,
strictEndpointGroups []string,
dialOpts []grpc.DialOption,
unhealthyStoreTimeout time.Duration,
endpointInfoTimeout time.Duration,
queryConnMetricLabels ...string,
) *query.EndpointSet {
endpointSet := query.NewEndpointSet(
time.Now,
logger,
reg,
func() (specs []*query.GRPCEndpointSpec) {
// Add strict & static nodes.
for _, addr := range strictStores {
specs = append(specs, query.NewGRPCEndpointSpec(addr, true))
}
for _, addr := range strictEndpoints {
specs = append(specs, query.NewGRPCEndpointSpec(addr, true))
}
for _, dnsProvider := range dnsProviders {
var tmpSpecs []*query.GRPCEndpointSpec
for _, addr := range dnsProvider.Addresses() {
tmpSpecs = append(tmpSpecs, query.NewGRPCEndpointSpec(addr, false))
}
tmpSpecs = removeDuplicateEndpointSpecs(logger, duplicatedStores, tmpSpecs)
specs = append(specs, tmpSpecs...)
}
for _, eg := range endpointGroupAddrs {
spec := query.NewGRPCEndpointSpec(fmt.Sprintf("thanos:///%s", eg), false, extgrpc.EndpointGroupGRPCOpts()...)
specs = append(specs, spec)
}
for _, eg := range strictEndpointGroups {
spec := query.NewGRPCEndpointSpec(fmt.Sprintf("thanos:///%s", eg), true, extgrpc.EndpointGroupGRPCOpts()...)
specs = append(specs, spec)
}
return specs
},
dialOpts,
unhealthyStoreTimeout,
endpointInfoTimeout,
queryConnMetricLabels...,
)
// Periodically update the store set with the addresses we see in our cluster.
{
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
return runutil.Repeat(5*time.Second, ctx.Done(), func() error {
endpointSet.Update(ctx)
return nil
})
}, func(error) {
cancel()
})
}
return endpointSet
}
Additionally, consider using the --store-strict
flag introduced in Thanos Query, which ensures that only statically specified nodes are retained and always considered part of the active store set. This can help avoid issues with dynamic store nodes that might cause duplicates:
* Add a new flag to Thanos Query `--store-strict` which will only accept statically specified nodes and Thanos Query will always retain the last successfully retrieved information of them via the `Info()` gRPC method. Thus, they will always be considered as part of the active store set.
By ensuring that your store addresses are unique and properly configured, and by using the --store-strict
flag if applicable, you can prevent the thanos_query_duplicated_store_addresses_total
metric from increasing [5][6].
To continue talking to Dosu, mention @dosu.
Thanos version used:
quay.io/thanos/thanos:v0.34.1
(deployed in K8s)What happened: We are encountering an issue with duplicate stores being detected in our Thanos Query setup. The
thanos_query_duplicated_store_addresses_total
metric is increasing, indicating that the same store is being registered multiple times. Despite various changes and configurations, we haven't been able to resolve the issue.Anything else we need to know: Query configuration:
We have tried specifying the endpoint in various formats like:
--store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc
OR directly with static IP:port or DNS name of the endpoint but with no change..Running
nslookup
directly from thanos-query pod instance returns something like this:Full logs to relevant components: