tkestack / kvass

Kvass is a Prometheus horizontal auto-scaling solution , which uses Sidecar to generate special config file only containes part of targets assigned from Coordinator for every Prometheus shard.
Apache License 2.0
614 stars 90 forks source link

coordinator创建新的prometheus但是没有分配target #51

Open like-inspur opened 3 years ago

like-inspur commented 3 years ago

coordinator运行时创建2个prometheus,后来采集的target和series也没有变化,但是发现coordinator创建了第3个prometheus 查看第3个prometheus没有发现coordinator分配的target,此prometheus为空配置运行,kvass版本为0.1.0

RayHuangCN commented 3 years ago

这个有更详细的信息吗,例如前2个prometheus的series情况,是不是必现的,或者coordinator的日志

like-inspur commented 3 years ago

根据series的分配,启动时2个prometheus就够了;但是coordinator分配了3个prometheus,查看每个prometheus发现,prometheus-1并没有target,应该是cooridnator判断prometheus-1异常,因此扩展一个副本创建prometheus-2并分配target;因为配置了允许缩容,过了一段时间,prometheus-2的target迁移到proemtheus-1,prometheus的副本数从3降为2 本次问题原因出现在coordinator对prometheus-1判断异常,版本0.1.4,cooridnator日志如下:

time="2021-06-17T11:22:37Z" level=info msg="need space 64107" component=coordinator
time="2021-06-17T11:22:37Z" level=info msg="change scale to 2" component="shard manager" sts=prometheus
time="2021-06-17T11:22:47Z" level=info msg="need space 37517" component=coordinator
time="2021-06-17T11:22:47Z" level=info msg="prometheus-0 need update targets" component="shard manager" shard=prometheus-0 sts=prometheus
time="2021-06-17T11:22:47Z" level=info msg="change scale to 3" component="shard manager" sts=prometheus
time="2021-06-17T11:22:57Z" level=info msg="prometheus-2 need update targets" component="shard manager" shard=prometheus-2 sts=prometheus
time="2021-06-17T11:22:57Z" level=info msg="prometheus-0 need update targets" component="shard manager" shard=prometheus-0 sts=prometheus
time="2021-06-17T13:00:13Z" level=info msg="try mark transfer all targets from prometheus-2" component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (226) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (255) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (1631) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (375) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (236) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (1715) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (724) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (5430) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (762) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (693) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (840) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (507) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (718) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (299) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (199) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (2107) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (199) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (811) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (209) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (485) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (209) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (224) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (199) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="transfer target from prometheus-2 to prometheus-1 series = (5148) " component=coordinator
time="2021-06-17T13:00:13Z" level=info msg="prometheus-2 need update targets" component="shard manager" shard=prometheus-2 sts=prometheus
time="2021-06-17T13:00:13Z" level=info msg="prometheus-1 need update targets" component="shard manager" shard=prometheus-1 sts=prometheus
time="2021-06-17T13:00:53Z" level=info msg="prometheus-2 need update targets" component="shard manager" shard=prometheus-2 sts=prometheus
time="2021-06-17T13:01:03Z" level=info msg="prometheus-2 need update targets" component="shard manager" shard=prometheus-2 sts=prometheus
time="2021-06-17T13:02:33Z" level=info msg="try mark transfer all targets from prometheus-2" component=coordinator
time="2021-06-17T13:02:33Z" level=info msg="prometheus-2 need update targets" component="shard manager" shard=prometheus-2 sts=prometheus
time="2021-06-17T14:02:37Z" level=info msg="prometheus-2 is remove able" component=coordinator
time="2021-06-17T14:02:37Z" level=info msg="change scale to 2" component="shard manager" sts=prometheus