python-zk / kazoo

Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.
https://kazoo.readthedocs.io
Apache License 2.0
1.3k stars 386 forks source link

Make tests less flaky on CI #716

Closed StephenSorriaux closed 1 year ago

StephenSorriaux commented 1 year ago

Why is this needed?

This makes the tests way less flaky, I hope (and first results confirmed it... but you never know).

Proposed Changes

Does this PR introduce any breaking change?

Nope.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage has no change and project coverage change: +0.36 :tada:

Comparison is base (ffa0ae9) 96.28% compared to head (a41f564) 96.65%.

:exclamation: Current head a41f564 differs from pull request most recent head 64af698. Consider uploading reports for the commit 64af698 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #716 +/- ## ========================================== + Coverage 96.28% 96.65% +0.36% ========================================== Files 27 27 Lines 3554 3554 ========================================== + Hits 3422 3435 +13 + Misses 132 119 -13 ``` [see 7 files with indirect coverage changes](https://codecov.io/gh/python-zk/kazoo/pull/716/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk) Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

jeffwidman commented 1 year ago

I ❀️ the effort, and the changes look mostly good, but there's a lot of disparate changes in here... would it be possible at a minimum to separate some of them out to distinct commits, at ideally to separate PR's?

Benefits:

A few obvious ones that should be easy to peel off:

In fact TBH, most of those bullet points look like obvious candidates for distinct PRs.

I'm happy to πŸ™‹ volunteer to review the separate PR's and try to do so quickly so you're not blocked / slow to get them merged.

StephenSorriaux commented 1 year ago

Thanks Jeff for taking the time. I will split my stuff.

jeffwidman commented 1 year ago
  • I added a "custom" retry logic without using a KazooRetry since KazooRetry does not handle refused/dropped connections like I would want too: even if a connection_retry is given to KazooClient(), the retry configuration is actually not respected because the interrupt() end up being triggered anyway (ie. the client._stopped event)

This is fine for this PR, but is this something we should consider changing/fixing in KazooClient/KazooRetry long-term?

ceache commented 1 year ago

I'll try to review this thoroughly this week. Sorry I have been quite distracted recently.

Besides the flakiness within the tests themselves, i was looking at the harness recently and I wanted to know how you felt about swapping our custom zookeeper ensemble orchestration driver for some docker compose driver. This would remove a lot of complexity and allow us to know we are starting the tests only after the ensemble is actually up and ready. I know how to craft such a compose definition and used it numerous times for my own testing. The down side is introducing a dependency on docker for local testing (or we could leave the "native" harness around for docker-less development)

What do you think?

On Tue, Apr 11, 2023, 18:05 Stephen Sorriaux @.***> wrote:

@.**** commented on this pull request.

In kazoo/testing/harness.py https://github.com/python-zk/kazoo/pull/716#discussion_r1163376507:

  • client_cluster_health.start()
  • client_cluster_health.ensure_path("/")
  • client_cluster_health.stop()
  • self.log(logging.INFO, "cluster looks ready to go")
  • break
  • except Exception:
  • tries += 1
  • if tries >= MAX_INIT_TRIES:
  • raise
  • if tries > 0 and tries % 2 == 0:
  • self.log(
  • logging.WARNING,
  • "nuking current cluster and making another one",
  • )
  • self.cluster.terminate()
  • self.cluster.start()

Yes, there is already some sleep() hidden in cluster.start() ( https://github.com/python-zk/kazoo/blob/master/kazoo/testing/common.py#L370 )

β€” Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/pull/716#discussion_r1163376507, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHUSQS3TH5DKY3SMGM3XAXIQ7ANCNFSM6AAAAAAWHB5WWQ . You are receiving this because you were mentioned.Message ID: @.***>

StephenSorriaux commented 1 year ago
  • I added a "custom" retry logic without using a KazooRetry since KazooRetry does not handle refused/dropped connections like I would want too: even if a connection_retry is given to KazooClient(), the retry configuration is actually not respected because the interrupt() end up being triggered anyway (ie. the client._stopped event)

This is fine for this PR, but is this something we should consider changing/fixing in KazooClient/KazooRetry long-term?

Yes, I think we should consider looking into it, especially what to expect of the retry when used as a connection_retry parameter for the client

StephenSorriaux commented 1 year ago

I'll be all in for a docker based harness! I personally have java installed on my desktop just for kazoo...

jeffwidman commented 1 year ago

I would 110% support a docker-based harness. No problem adding that as a test/dev dependency, it's almost certainly more accessible to drive-by devs as well who are going to be familiar with that from other libraries doing similar things.

StephenSorriaux commented 1 year ago
  • I added a "custom" retry logic without using a KazooRetry since KazooRetry does not handle refused/dropped connections like I would want too: even if a connection_retry is given to KazooClient(), the retry configuration is actually not respected because the interrupt() end up being triggered anyway (ie. the client._stopped event)

This is fine for this PR, but is this something we should consider changing/fixing in KazooClient/KazooRetry long-term?

Yes, I think we should consider looking into it, especially what to expect of the retry when used as a connection_retry parameter for the client

Update: I think https://github.com/python-zk/kazoo/pull/685 tried to tackle the "issue" I mentionned

ceache commented 1 year ago

I will try to rebase my PR, now that this is merged.

Would there be anything holding it back ? I understood Jeff's comment to mean he thought it was ok.

On Mon, Apr 24, 2023, 21:55 Stephen Sorriaux @.***> wrote:

  • I added a "custom" retry logic without using a KazooRetry since KazooRetry does not handle refused/dropped connections like I would want too: even if a connection_retry is given to KazooClient(), the retry configuration is actually not respected because the interrupt() end up being triggered anyway (ie. the client._stopped event)

This is fine for this PR, but is this something we should consider changing/fixing in KazooClient/KazooRetry long-term?

Yes, I think we should consider looking into it, especially what to expect of the retry when used as a connection_retry parameter for the client

Update: I think #685 https://github.com/python-zk/kazoo/pull/685 tried to tackle the "issue" I mentionned

β€” Reply to this email directly, view it on GitHub https://github.com/python-zk/kazoo/pull/716#issuecomment-1520744338, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIFTHVM7SYTPRBYKHIHEWTXC3LERANCNFSM6AAAAAAWHB5WWQ . You are receiving this because you were mentioned.Message ID: @.***>

StephenSorriaux commented 1 year ago

Sorry for the rebase, test_connection.py might give you some conflict...

I think I owe you a review on https://github.com/python-zk/kazoo/pull/685, will take the time now!