linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.68k stars 574 forks source link

Get cluster load API (sometimes) ignores `start` and `end` arguments #2154

Open AlbertoPeon opened 1 month ago

AlbertoPeon commented 1 month ago

Hello,

We have noticed that the GET /kafkacruisecontrol/load endpoint ignores the start, end and time parameters under some conditions (which we have not yet been able to identify).

This can be easily reproduced using cccli. For instance, retrieving the cluster load for a given 1 hour time window does not always return the same results.

The first time shows the right average across every dimension for the 1 hour time windows.

$  cccli -a kafka-dev-cruise-control-headless:9090 load --add-parameter start=1716454190232 end=1716457790233
Starting long-running poll of http://kafka-dev-cruise-control-headless:9090/kafkacruisecontrol/load?allow_capacity_estimation=False&start=1716454190232&end=1716457790233

HOST         BROKER      RACK         DISK_CAP(MB)            DISK(MB)/_(%)_            CORE_NUM         CPU(%)          NW_IN_CAP(KB/s)       LEADER_NW_IN(KB/s)     FOLLOWER_NW_IN(KB/s)         NW_OUT_CAP(KB/s)        NW_OUT(KB/s)       PNW_OUT(KB/s)    LEADERS/REPLICAS
-,         10000,us-east-1a,         1192092.000,          10466.635/00.88,                  1,         9.706,               97656.000,                 361.137,                 711.288,              195312.000,           1081.202,           3224.528,            62/185
-,         10001,us-east-1b,         1192092.000,          10466.635/00.88,                  1,         9.014,               97656.000,                 319.845,                 752.581,              195312.000,            962.311,           3224.528,            58/185
-,         10002,us-east-1c,         1192092.000,          10466.635/00.88,                  1,        12.585,               97656.000,                 391.444,                 680.982,              195312.000,           1181.015,           3224.528,            65/185

However, waiting a bit and running the same query outputs different values. We suspect these values correspond to the cluster load for the default time window, effectively ignoring the start and end parameters. I believe this corresponds to the time window between the earliest available timestamp and the current time.

$ cccli -a kafka-dev-cruise-control-headless:9090 load --add-parameter start=1716454190232 end=1716457790233
Starting long-running poll of http://kafka-dev-cruise-control-headless:9090/kafkacruisecontrol/load?allow_capacity_estimation=False&start=1716454190232&end=1716457790233

HOST         BROKER      RACK         DISK_CAP(MB)            DISK(MB)/_(%)_            CORE_NUM         CPU(%)          NW_IN_CAP(KB/s)       LEADER_NW_IN(KB/s)     FOLLOWER_NW_IN(KB/s)         NW_OUT_CAP(KB/s)        NW_OUT(KB/s)       PNW_OUT(KB/s)    LEADERS/REPLICAS
-,         10000,us-east-1a,         1192092.000,          10931.121/00.92,                  1,        11.381,               97656.000,                 386.458,                 762.798,              195312.000,           1155.897,           3446.797,            62/185
-,         10001,us-east-1b,         1192092.000,          10931.121/00.92,                  1,        10.193,               97656.000,                 343.802,                 805.454,              195312.000,           1032.157,           3446.797,            58/185
-,         10002,us-east-1c,         1192092.000,          10931.121/00.92,                  1,        10.482,               97656.000,                 418.996,                 730.259,              195312.000,           1258.743,           3446.797,            65/185

In fact, running the same without start and end arguments returns the same values as the previous command:

$ cccli -a kafka-dev-cruise-control-headless:9090 load
Starting long-running poll of http://kafka-dev-cruise-control-headless:9090/kafkacruisecontrol/load?allow_capacity_estimation=False

HOST         BROKER      RACK         DISK_CAP(MB)            DISK(MB)/_(%)_            CORE_NUM         CPU(%)          NW_IN_CAP(KB/s)       LEADER_NW_IN(KB/s)     FOLLOWER_NW_IN(KB/s)         NW_OUT_CAP(KB/s)        NW_OUT(KB/s)       PNW_OUT(KB/s)    LEADERS/REPLICAS
-,         10000,us-east-1a,         1192092.000,          10931.121/00.92,                  1,        11.381,               97656.000,                 386.458,                 762.798,              195312.000,           1155.897,           3446.797,            62/185
-,         10001,us-east-1b,         1192092.000,          10931.121/00.92,                  1,        10.193,               97656.000,                 343.802,                 805.454,              195312.000,           1032.157,           3446.797,            58/185
-,         10002,us-east-1c,         1192092.000,          10931.121/00.92,                  1,        10.482,               97656.000,                 418.996,                 730.259,              195312.000,           1258.743,           3446.797,            65/185

Periodically running a command with start and end (or time) parameters, will inconsistently return one or the other.

Plotting this into a graph we can confirm how the load of the cluster oscillates between the two time windows:

Screenshot 2024-05-23 at 13 58 23

In blue we can see the live system metrics while in purple we see the cluster load as reported by the Cruise Control endpoint.

After cutting down Kafka traffic to half, we can see that the CruiseControl load reflects that after a delay (which makes sense as it is not live data but the accumulated average over the last time window). However, what it is not expected is that the values show "waves". From observation we suspect the low points of the waves correspond to querying the load within the 1 hour time window, as they converge with the system metric after that time. The high points of the wave take longer to converge, approximately after 4 hours, which we suspect is the default time window.

Could you help me understand why this is happening and how to prevent it? Thank you very much!