Open dctrwatson opened 10 months ago
What options do you have enabled? Have you tried --writer.intern
?
To add to that, would be great to see the full configuration of the receiver including flags and hashring. It looks like it's stuck in GC so I wonder if there is a routing loop.
receive
--log.level=info
--log.format=logfmt
--grpc-address=0.0.0.0:10901
--http-address=0.0.0.0:10902
--remote-write.address=0.0.0.0:19291
--objstore.config=$(OBJSTORE_CONFIG)
--tsdb.path=/var/thanos/receive
--label=thanos_receive_replica="$(NAME)"
--label=receive="true"
--tsdb.retention=26h
--receive.local-endpoint=$(NAME).thanos-receive-headless.$(NAMESPACE).svc.cluster.local.:10901
--grpc-server-tls-cert=/cert/tls.crt
--grpc-server-tls-key=/cert/tls.key
--grpc-server-tls-client-ca=/cert/ca.crt
--label=metrics_namespace="global"
--receive.tenant-label-name=cluster
--receive.default-tenant-id=unknown
--receive.hashrings-file-refresh-interval=1m
--remote-write.server-tls-cert=/cert/tls.crt
--remote-write.server-tls-client-ca=/cert/ca.crt
--remote-write.server-tls-key=/cert/tls.key
--tsdb.memory-snapshot-on-shutdown
--tsdb.max-block-duration=1h
--tsdb.min-block-duration=1h
--writer.intern
We're running with distributor:
receive
--log.level=info
--log.format=logfmt
--grpc-address=0.0.0.0:10901
--http-address=0.0.0.0:10902
--remote-write.address=0.0.0.0:19291
--label=replica="$(NAME)"
--label=receive="true"
--receive.hashrings-file=/var/lib/thanos-receive/hashrings.json
--receive.replication-factor=1
--grpc-server-tls-cert=/cert/tls.crt
--grpc-server-tls-key=/cert/tls.key
--grpc-server-tls-client-ca=/cert/ca.crt
--receive.grpc-compression=snappy
--receive-forward-timeout=30s
--receive.hashrings-algorithm=ketama
--receive.hashrings-file-refresh-interval=1m
--receive.relabel-config=$(RECEIVE_RELABEL_CONFIG)
--receive.tenant-label-name=cluster
--receive.default-tenant-id=unknown
--remote-write.client-tls-ca=/cert/ca.crt
--remote-write.client-tls-cert=/cert/tls.crt
--remote-write.client-tls-key=/cert/tls.key
--remote-write.client-server-name=thanos-receive-headless.monitoring.svc.cluster.local
--remote-write.server-tls-cert=/cert/tls.crt
--remote-write.server-tls-client-ca=/cert/ca.crt
--remote-write.server-tls-key=/cert/tls.key
The hashring is managed by https://github.com/observatorium/thanos-receive-controller
I cannot see anything wrong in the configuration. Maybe you can take a look at an allocation profile to see where objects are being allocated.
Is the problematic one the router or the ingester?
Is the problematic one the router or the ingester?
Ingester
Please try out the capnproto replication available on main
.
Thanos, Prometheus and Golang version used:
Object Storage Provider: s3
What happened: after running for some time, receive pegs CPU.
What you expected to happen: CPU usage to be proportional to write load
How to reproduce it (as minimally and precisely as possible):
Full logs to relevant components:
Anything else we need to know: https://pprof.me/aa81313c5472bcec2e81765384e11748