karmada-io / karmada

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
https://karmada.io
Apache License 2.0
4.14k stars 813 forks source link

Migrate deprecated wait.Poll function #3906

Open Affan-7 opened 9 months ago

Affan-7 commented 9 months ago

What type of PR is this? /kind cleanup

What this PR does / why we need it: This PR replaces the deprecated wait.Poll() function with the wait.PollUntilContextTimeout() function.

Which issue(s) this PR fixes: Related to #3835 1 out of 4 tasks

Does this PR introduce a user-facing change?:

NONE
karmada-bot commented 9 months ago

Welcome @Affan-7! It looks like this is your first PR to karmada-io/karmada 🎉

carlory commented 9 months ago

/lgtm

/assign @rainbowmango

for workflow and approval.

codecov-commenter commented 9 months ago

Codecov Report

Merging #3906 (2954bcb) into master (e5277b6) will decrease coverage by 0.14%. Report is 30 commits behind head on master. The diff coverage is 100.00%.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #3906      +/-   ##
==========================================
- Coverage   54.83%   54.70%   -0.14%     
==========================================
  Files         228      229       +1     
  Lines       21848    22074     +226     
==========================================
+ Hits        11981    12075      +94     
- Misses       9227     9353     +126     
- Partials      640      646       +6     
Flag Coverage Δ
unittests 54.70% <100.00%> (-0.14%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/util/serviceaccount.go 90.27% <100.00%> (ø)

... and 6 files with indirect coverage changes

karmada-bot commented 9 months ago

New changes are detected. LGTM label has been removed.

Affan-7 commented 9 months ago

The tests were failing, here are the logs: https://github.com/karmada-io/karmada/actions/runs/5783869909/job/15676511801#step:5:2635

Should we increase the default value of j.Wait? Or there is some other problem?

I can't run e2e tests locally, because they are too slow on my computer.

RainbowMango commented 9 months ago

It might be flak, let's give it another try.

RainbowMango commented 9 months ago

Should we increase the default value of j.Wait? Or there is some other problem?

I'm not sure yet. let's give it another try and try to find the root cause.

I'm curious that the test failed in 42 seconds, but the default waiting time(j.Wait) is 60 seconds.

Just echo the log here: https://github.com/karmada-io/karmada/actions/runs/5787607401/job/15697196471?pr=3906#step:5:2723

[FAILED] [42.440 seconds]
[cluster joined] reschedule testing Deployment propagation testing [AfterEach] testing clusterAffinity of the policy when the ReplicaScheduling of the policy is nil, reschedule testing [needCreateCluster]
  [AfterEach] /home/runner/work/karmada/karmada/test/e2e/rescheduling_test.go:176
  [It] /home/runner/work/karmada/karmada/test/e2e/rescheduling_test.go:271

  Captured StdOut/StdErr Output >>
  I0808 02:23:49.855080   46074 deployment.go:199] The ResourceBinding(karmadatest-2dj87/deploy-tjnp2-deployment) schedule result is: member1,member2
  cluster(member-e2e-lhr) is joined successfully
  I0808 02:23:59.098817   46074 deployment.go:199] The ResourceBinding(karmadatest-2dj87/deploy-tjnp2-deployment) schedule result is: member-e2e-lhr,member1,member2
  I0808 02:23:59.109167   46074 deployment.go:199] The ResourceBinding(karmadatest-2dj87/deploy-tjnp2-deployment) schedule result is: member-e2e-lhr,member1,member2
  E0808 02:23:59.135302   46074 unjoin.go:239] Failed to delete cluster object. cluster name: member-e2e-lhr, error: context deadline exceeded
  E0808 02:23:59.136891   46074 unjoin.go:174] Failed to delete cluster object. cluster name: member-e2e-lhr, error: context deadline exceeded
RainbowMango commented 9 months ago

/remove-approve We need to figure out the root cause, probably affected by https://github.com/kubernetes/kubernetes/issues/119533

karmada-bot commented 9 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please ask for approval from rainbowmango after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/karmada-io/karmada/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
Affan-7 commented 9 months ago

Here is what I did:

Summary:

Value of j.Wait was zero in the both cases wait.Poll and wait.PollUntilContextTimeout(). Since the wait.Poll() is deprecated it no longer returns any errors. That's why wait.Poll() was passing the test and wait.PollUntilContextTimeout() was failing. I specified the value of j.Wait explicitly under the test file and everything works fine as expected.


Longer explanation:

I added fmt.Printf("+++++ value of j.Wait: %v\n", j.Wait) before wait.PollUntilContextTimeout(context.TODO(), 1*time.Second, j.Wait, .....). And I ran the test on the my fork of Karmada, using github actions. I was surprised that the value of j.Wait was zero.

Reference: https://github.com/Affan-7/karmada/actions/runs/5841035785/job/15840850551#step:5:2798

So I run the test again with the deprecated wait.Poll() method. But the test passed! Then I ran it again with fmt.Printf("+++++ value of j.Wait: %v\n", j.Wait) and the value of j.Wait was also zero.

Reference: https://github.com/Affan-7/karmada/actions/runs/5841648595/job/15842010633#step:5:2999

The value of j.Wait was still zero but the test case weren't failing. Because the wait.Poll() method is deprecated and doesn't return errors anymore. I found this in the documentation of wait.Poll.

Deprecated: This method does not return errors from context, use PollWithContextTimeout. Note that the new method will no longer return ErrWaitTimeout and instead return errors defined by the context package. Will be removed in a future release.

Reference: https://pkg.go.dev/k8s.io/apimachinery/pkg/util/wait#Poll

I was assuming that the default value of j.Wait should be 60 because of flags.DurationVar(&j.Wait, "wait", 60*time.Second, ....) under the unjoin.go. But this block of code doesn't get executed during the test (because it's for CLI). Therefore the value of 'j.Wait' remains zero.

So I added the value of j.Wait explicitly under rescheduling_test.go by using Wait: 60 * time.Second,. It's very similar to the Wait: 5 * options.DefaultKarmadactlCommandDuration, on the line 441 of the karmadactl_test.go.

After specifying the value of j.Wait under the rescheduling_test.go all the tests were passing successfully. Reference: https://github.com/Affan-7/karmada/actions/runs/5841642875/job/15841998395#step:5:3461

Affan-7 commented 9 months ago

Can you please merge this @RainbowMango :blush:. Or is there something that should I improve?

RainbowMango commented 9 months ago

Thanks @Affan-7 for the excellent analysis, I think it is acceptable to set a timeout here.

I wonder if this is affected by https://github.com/kubernetes/kubernetes/issues/119533?

I'm hesitant to do it before https://github.com/kubernetes/kubernetes/issues/119533.

RainbowMango commented 1 week ago

Hi @Affan-7 We can get back on this now as we already updated the Kubernetes dependencies to v1.29+ which includes the fix we want. Please feel free to let me know if you can work on this.