Closed a7i closed 2 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 51.68%. Comparing base (
b5045c5
) to head (bf4ec8a
). Report is 17 commits behind head on master.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
/lgtm /assign @XiShanYongYe-Chang
@a7i nice finding! Thank you!
Currently we get in a loop of Service updates / Endpoint deletion/recreation because karmada-scheduler keeps rescheduling as it thinks placement has changed:
I understand this patch can avoid unnecessary rescheduling, but I don't get why this causes the endpoint deletion. In my opinion, the scheduler result should remain consistent between each loop of re-scheduling. @jwcesign Can you help to confirm that?
This should normally be ok, but karmda-scheduler does a reflect.DeepEqual here as opposed to checking the content of the list.
By the way.
The reflect.DeepEqual
here is a short patch, checks the cluster affinity here:
https://github.com/karmada-io/karmada/blob/2bebae0701c3b6c21a08230d19c02b4d0e84690d/pkg/scheduler/helper.go#L56
But it still can not check the content of list.
Let me take a look.
Perhaps it's the way we're using MultiClusterService
, we want bidirectional network connectivity so both clusters are in consumers and providers:
For example:
---
apiVersion: networking.karmada.io/v1alpha1
kind: MultiClusterService
metadata:
name: istio-mcs-test
spec:
types:
- CrossCluster
serviceProvisionClusters:
- member1
- member2
serviceConsumptionClusters:
- member1
- member2
What we observed is that the Service
is created in member2, then deleted, then created. Which causes the member cluster's kube-controller-manager (endpoint-controller) to delete the Endpoint and recreate again.
(please assume that the masked cluster name is member2
)
We will be debugging again tomorrow to identify any other issues with our setup or code.
I understand that this patch saves some rescheduling. ~It's a cleanup, not a bug fix. What do you think?~ @a7i
I think this is a bug, that would bring unnecessary rescheduling which would confuse users.
And we should backport this to previous releases as well.
/kind bug
New changes are detected. LGTM label has been removed.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: To complete the pull request process, please ask for approval from rainbowmango after the PR has been reviewed.
The full list of commands accepted by this bot can be found here.
unfortunately, this did not solve the issue so still investigating
unfortunately, this did not solve the issue so still investigating
I'm getting involved too.
unfortunately, this did not solve the issue so still investigating
Hi @a7i, any progress?
Or is there a guide for me to do so that I can reproduce it exactly?
unfortunately, this did not solve the issue so still investigating
Hi @a7i, any progress?
Or is there a guide for me to do so that I can reproduce it exactly?
Hi @XiShanYongYe-Chang My colleague @SerenaTiede-Zen and I will link up this week and provide the steps
Hi @SerenaTiede-Zen If the related issue has not been reproduced, can we move forward with this PR based on the current code? The current revision is also a meaningful one.
Hi @SerenaTiede-Zen If the related issue has not been reproduced, can we move forward with this PR based on the current code? The current revision is also a meaningful one.
/cc @a7i
@XiShanYongYe-Chang: GitHub didn't allow me to request PR reviews from the following users: a7i.
Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs.
Hi @XiShanYongYe-Chang , I explained our setup a bit in this Issue
@SerenaTiede-Zen and I paired on this today and the issue seems to be resolved on the latest code base. Our assumption is this change resolved it: https://github.com/karmada-io/karmada/pull/4818
Our assumption is this change resolved it: #4818
Glad to hear the problem has been resolved, although I still don't understand what the problem was.
I will do my best to debug further next week and do a write up for your review
What type of PR is this?
/king bug
What this PR does / why we need it: Karmada
MultiClusterService
uses an unsored list of clusters which can change order over time (set has no order guarantee). This should normally be ok, but karmda-scheduler does a reflect.DeepEqual here as opposed to checking the content of the list.My first reaction was to change karmada-scheduler to not do this, but perhaps the order of clusters has a significance. But in a MultiClusterService provider/consumer, it does not have any significance as long as it's available.
Currently we get in a loop of Service updates / Endpoint deletion/recreation because karmada-scheduler keeps rescheduling as it thinks placement has changed:
Which issue(s) this PR fixes: Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?: