streamnative / pulsar-archived

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org
Apache License 2.0
72 stars 25 forks source link

ISSUE-12928: Rebalancing functions does nothing #3307

Open sijie opened 3 years ago

sijie commented 3 years ago

Original Issue: apache/pulsar#12928


Describe the bug Hitting the endpoint to rebalance functions does not appear to work consistently in Pulsar 2.7.2.

To Reproduce Steps to reproduce the behavior:

First, we look at the function assignments:

$ curl fab08.example.domain.com:8080/admin/v2/worker/assignments -H "Authorization: Bearer eyJhb...mNog" | python -m json.tool

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1723 100 1723 0 0 210k 0 --:--:-- --:--:-- --:--:-- 210k

{

"c-pulsar-pcdc1-green-test-fw-fab09.example.domain.com-8080": [
"amplitude/processing/random-2:5",
"amplitude/processing/random-2:4",
"amplitude/processing/random-2:3",
"amplitude/processing/random-2:2",
"amplitude/processing/random-2:9",
"amplitude/processing/random-2:8",
"amplitude/processing/random-2:7",
"amplitude/processing/random-2:6",
"amplitude/processing/random-2:1",
"amplitude/processing/random-2:0",
"amplitude/processing/random-1:23",
"amplitude/processing/random-1:21",
"amplitude/processing/random-1:22",
"amplitude/processing/random-1:20",
"amplitude/processing/random-1:14",
"amplitude/processing/random-1:6",
"amplitude/processing/random-1:5",
"amplitude/processing/random-1:15",
"amplitude/processing/random-1:4",
"amplitude/processing/random-1:12",
"amplitude/processing/random-1:13",
"amplitude/processing/random-1:3",
"amplitude/processing/random-2:23",
"amplitude/processing/random-1:10",
"amplitude/processing/random-2:22",
"amplitude/processing/random-1:11",
"amplitude/processing/random-1:9",
"amplitude/processing/random-2:21",
"amplitude/processing/random-1:8",
"amplitude/processing/random-2:20",
"amplitude/processing/random-1:7",
"amplitude/processing/random-1:18",
"amplitude/processing/random-1:19",
"amplitude/processing/random-1:16",
"amplitude/processing/random-1:17",
"amplitude/processing/random-1:2",
"amplitude/processing/random-1:1",
"amplitude/processing/random-1:0",
"amplitude/processing/random-2:16",
"amplitude/processing/random-2:15",
"amplitude/processing/random-2:14",
"amplitude/processing/random-2:13",
"amplitude/processing/random-2:12",
"amplitude/processing/random-2:11",
"amplitude/processing/random-2:10",
"amplitude/processing/random-2:19",
"amplitude/processing/random-2:18",
"amplitude/processing/random-2:17"
]

}

Next, we trigger functions to rebalance:

$ curl fab08.example.domain.com:8080/admin/v2/worker/rebalance -X PUT -H "Authorization: Bearer eyJ...mNog"

Checking function assignments again after a few minutes shows no changes, as demonstrated below:

$ curl fab08.example.domain.com:8080/admin/v2/worker/assignments -H "Authorization: Bearer eyJ...mNog" | python -m json.tool

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1723 100 1723 0 0 560k 0 --:--:-- --:--:-- --:--:-- 560k

{

"c-pulsar-pcdc1-green-test-fw-fab09.example.domain.com-8080": [
"amplitude/processing/random-2:5",
"amplitude/processing/random-2:4",
"amplitude/processing/random-2:3",
"amplitude/processing/random-2:2",
"amplitude/processing/random-2:9",
"amplitude/processing/random-2:8",
"amplitude/processing/random-2:7",
"amplitude/processing/random-2:6",
"amplitude/processing/random-2:1",
"amplitude/processing/random-2:0",
"amplitude/processing/random-1:23",
"amplitude/processing/random-1:21",
"amplitude/processing/random-1:22",
"amplitude/processing/random-1:20",
"amplitude/processing/random-1:14",
"amplitude/processing/random-1:6",
"amplitude/processing/random-1:5",
"amplitude/processing/random-1:15",
"amplitude/processing/random-1:4",
"amplitude/processing/random-1:12",
"amplitude/processing/random-1:13",
"amplitude/processing/random-1:3",
"amplitude/processing/random-2:23",
"amplitude/processing/random-1:10",
"amplitude/processing/random-2:22",
"amplitude/processing/random-1:11",
"amplitude/processing/random-1:9",
"amplitude/processing/random-2:21",
"amplitude/processing/random-1:8",
"amplitude/processing/random-2:20",
"amplitude/processing/random-1:7",
"amplitude/processing/random-1:18",
"amplitude/processing/random-1:19",
"amplitude/processing/random-1:16",
"amplitude/processing/random-1:17",
"amplitude/processing/random-1:2",
"amplitude/processing/random-1:1",
"amplitude/processing/random-1:0",
"amplitude/processing/random-2:16",
"amplitude/processing/random-2:15",
"amplitude/processing/random-2:14",
"amplitude/processing/random-2:13",
"amplitude/processing/random-2:12",
"amplitude/processing/random-2:11",
"amplitude/processing/random-2:10",
"amplitude/processing/random-2:19",
"amplitude/processing/random-2:18",
"amplitude/processing/random-2:17"
]

}

After some experimentation, I discovered that I was able to trigger rebalancing to occur if I targeted the function worker leader, but it's not clear if this happens consistently or not.

Expected behavior Triggering function rebalancing should work consistently when triggered on any broker. Also, if there is a failure, it should be reported in the logs. In the current implementation, when sending the rebalance request, no logs appeared in the targeted broker except when it succeeded. More logging should indicate if there's a problem on the broker that receives the signal to rebalance functions.

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.