Closed lxtywypc closed 1 year ago
Welcome @lxtywypc! It looks like this is your first PR to karmada-io/karmada 🎉
Merging #2999 (71ffbf4) into master (7f2b660) will decrease coverage by
0.03%
. The diff coverage isn/a
.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
@@ Coverage Diff @@
## master #2999 +/- ##
==========================================
- Coverage 51.63% 51.61% -0.03%
==========================================
Files 210 210
Lines 18926 18926
==========================================
- Hits 9773 9769 -4
- Misses 8622 8625 +3
- Partials 531 532 +1
Flag | Coverage Δ | |
---|---|---|
unittests | 51.61% <ø> (-0.03%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
see 2 files with indirect coverage changes
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Hi @lxtywypc, the E2E has failed in the CI, have you tested it on your site?
Hi @lxtywypc, the E2E has failed in the CI, have you tested it on your site?
Humm... I don't know well about how e2e test works, but it may show different error after each retry, and it seems to be ok after some retries.
The E2E still failing after retrying. @XiShanYongYe-Chang
Hi @lxtywypc, I test with this pr on my side, after I deployed Karmada, I find that the status condition of the cluster member3
(Pull mode) is Unknown
, and when I delete the old karmada-controller-manager
pod, the logs of new pod report an panic:
Hi @lxtywypc, I test with this pr on my side, after I deployed Karmada, I find that the status condition of the cluster
member3
(Pull mode) isUnknown
, and when I delete the oldkarmada-controller-manager
pod, the logs of new pod report an panic:
Ok, I will check again later
@XiShanYongYe-Chang @RainbowMango
It should be ok this time.
I used a for-select loop to make sure the informer had synced, and in the early version I made it sleep 1s if the loop hadn't been done.
https://github.com/karmada-io/karmada/pull/2999/commits/4bbdf7470d510b4436f5ba2e19eaa6a1b55805b4#diff-a111978a679a9f03e49c67a43ecb913299d153e59ce48d2d30f5029f27ab601aR532
It seems that some calling of resource-interpreter comes early during this period, then it causes a terrible panic of empty interpreter pointer.
This time I build service informer and wait for it synced before newing a resource-interpreter. It can work now.
But since it would cause a panic for a period of sleep during starting interpreter, I think it may still have a potential panic risk if it process too long during starting interpreter. How do you think about?
/assign
Thanks, @lxtywypc, can you help compress your commits into one?
Ask for @ikaven1024 @RainbowMango to retake a review. /assign @ikaven1024 @RainbowMango
Thanks, @lxtywypc, can you help compress your commits into one?
Ok, but how about vendor? @XiShanYongYe-Chang
It LGTM
/cc @RainbowMango Ask for @RainbowMango to help take a look.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: RainbowMango
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Hi @lxtywypc can you help post your test result here? A simple test process report is enough.
Hi @XiShanYongYe-Chang , I'm sorry for replying late. I'll be glad to raise a test report, but what kind report would you like? Could you offer me a template?
I'll be glad to raise a test report, but what kind report would you like? Could you offer me a template?
Thanks~
I'm sorry, I don't have a specific template here either. My idea is that you can simply explain your local test process and post your test results here will be ok.
My idea is that you can simply explain your local test process and post your test results here will be ok.
Ok, I'll do it later.
Here is the test report:
Note: Some parts with the information of company in the example has been hidden.
Ro
, and a resource-interpreter-webhook
in cluster named ro-webhook
and implemented the InterpretReplica
point:
$ kubectl get svc -n karmada-system ro-webhook
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ro-webhook ClusterIP 10.7.93.168 <none> 443/TCP 196d
$ kubectl get resourceinterpreterwebhookconfiguration ro-webhook -o jsonpath='{"service: "}{.webhooks[0].clientConfig.service}{"\nrules: "}{.webhooks[0].rules}{"\n"}'
service: {"name":"ro-webhook","namespace":"karmada-system","path":"/interpret-ro","port":443}
rules: [{"apiGroups":["xxx.xxx"],"apiVersions":["v1"],"kinds":["Ro"],"operations":["InterpretReplica"]}]
test
with 0 replicas in namespace test
, and karmada-controller-manager
with image based on branch master
:
$ kubectl get ro -n test test -o jsonpath='{.spec.replicas}{"\n"}'
0
$ kubectl get deploy -n karmada-system karmada-controller-manager -o jsonpath='{.spec.template.spec.containers[0].image}{"\n"}'
xxx.xxx/karmada/karmada-controller-manager:master
Then we scale the workload up to 3, and get it's replicas
in resourcebinding
:
$ kubectl scale ro -n test test --replicas=3
xxx.xxx/test scaled
$ kubectl get rb -n test test-ro -o jsonpath='{.spec.replicas}{"\n"}'
$
Then we look up the log of karmada-controller-manager
:
$ kubectl exec -it karmada-controller-manager-6db886b97f-4fb5q -n karmada-system -- /bin/sh
/ # cd /var/log/karmada-controller-manager/
/var/log/karmada-controller-manager # cat runtime.log | grep test
I0420 07:08:18.189526 1 customized.go:74] Get replicas for object: xxx.xxx/v1, Kind=Ro test/test with webhook interpreter.
E0420 07:08:18.195101 1 detector.go:649] Failed to customize replicas for xxx.xxx/v1, Kind=Ro(test), Internal error occurred: failed calling webhook "ro-webhook/xxx.xxx": failed to call webhook: Post "https://ro-webhook.karmada-system.svc:443/interpret-ro?timeout=3s": dial tcp: lookup ro-webhook.karmada-system.svc on 10.5.168.10:53: no such host
E0420 07:08:18.195121 1 detector.go:472] Failed to build resourceBinding for object: xxx.xxx/v1, kind=Ro, test/test. error: Internal error occurred: failed calling webhook "ro-webhook/xxx.xxx": failed to call webhook: Post "https://ro-webhook.karmada-system.svc:443/interpret-ro?timeout=3s": dial tcp: lookup ro-webhook.karmada-system.svc on 10.5.168.10:53: no such host
I0420 07:08:18.195198 1 recorder.go:103] "events: Apply cluster policy(ro) failed: Internal error occurred: failed calling webhook \"ro-webhook/xxx.xxx\": failed to call webhook: Post \"https://ro-webhook.karmada-system.svc:443/interpret-ro?timeout=3s\": dial tcp: lookup ro-webhook.karmada-system.svc on 10.5.168.10:53: no such host" type="Warning" object={Kind:Ro Namespace:test Name:test UID:0f9fc62b-26ea-45da-8b18-085f1b167864 APIVersion:xxx.xxx/v1 ResourceVersion:643725715 FieldPath:} reason="ApplyPolicyFailed"
karmada-controller-manager
to the image based on this pull request
, and get the replicas
in resourcebinding
again:
$ kubectl get deploy -n karmada-system karmada-controller-manager -o jsonpath='{.spec.template.spec.containers[0].image}{"\n"}'
xxx.xxx/karmada/karmada-controller-manager:master-with-resolver
$ kubectl get rb -n test test-ro -o jsonpath='{.spec.replicas}{"\n"}'
3
Let's look up the log of karmada-controller-manager
again:
$ kubectl exec -it karmada-controller-manager-57dcd97579-k9shq -n karmada-system -- /bin/sh
/ # cd /var/log/karmada-controller-manager/
/var/log/karmada-controller-manager # cat runtime.log | grep test
I0420 07:32:17.723681 1 customized.go:77] Get replicas for object: xxx.xxx/v1, Kind=Ro test/test with webhook interpreter.
I0420 07:32:17.735193 1 detector.go:510] Update ResourceBinding(test-ro) successfully.
I0420 07:32:17.735240 1 recorder.go:103] "events: Apply cluster policy(ro) succeed" type="Normal" object={Kind:Ro Namespace:test Name:test UID:0f9fc62b-26ea-45da-8b18-085f1b167864 APIVersion:xxx.xxx/v1 ResourceVersion:643725715 FieldPath:} reason="ApplyPolicySucceed"
The webhook can be accessed successfully, and we haven't met any other problems yet.
/cc @XiShanYongYe-Chang @RainbowMango
@lxtywypc a clear test report, thanks a lot.
/lgtm /hold cancel
Can you help update the release-note, such as:
karmada-controller-manager/karmada-agent: support connection to resourceInterpretWebhook without DNS Service
@XiShanYongYe-Chang I'm glad to. But I don't know clearly what I should do.
Should I cherry pick this PR to other release branches and make tags?
I'm glad to. But I don't know clearly what I should do.
Modify this place will be ok:
Should I cherry pick this PR to other release branches and make tags?
We usually synchronize bugs to previous versions, and the current change is more like a feature to us. Do you need to use it in previous versions?
I don't think we need to cherry-pick this to release branches.
@XiShanYongYe-Chang @RainbowMango I see. Thanks a lot.
What type of PR is this? /kind feature
What this PR does / why we need it: If we deploy karmada with Default dnsPolicy or deploy karmada in physical machine/virtual machine directly, and there is no coreDNS or other DNS services, it would not connect to resource-interpret-webhooks successfully with DefaultServiceResolver.
Which issue(s) this PR fixes: Fixes #2998
Special notes for your reviewer: NONE
Does this PR introduce a user-facing change?: