python-zk / kazoo

Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.
https://kazoo.readthedocs.io
Apache License 2.0
1.3k stars 386 forks source link

Fix possible endless wait in stop() after AUTH_FAILED error #688

Closed azat closed 1 year ago

azat commented 1 year ago

In case of AUTH_FAILED in the zk-loop thread it will call client._session_callback which will reset the queue.

However another thread can add to this queue CloseInstance event, and if the _session_callback() will be called after CloseInstance was added to the queue, then stop() will never return (and zk-loop will endlessly spin).

Here is how it looks like with addititional logging:

39: [ Thread-3 (zk_loop) ] INFO: client.py:568: _session_callback: Zookeeper session closed, state: AUTH_FAILED
39: [ MainThread ] Level 5: client.py:721: stop: Sending CloseInstance
39: [ Thread-3 (zk_loop) ] Level 5: client.py:403: _reset: Reseting the client
39: [ Thread-3 (zk_loop) ] Level 5: connection.py:625: _connect_attempt: Connecting
39: [ Thread-3 (zk_loop) ] Level 5: connection.py:625: _connect_attempt: Connecting

You can find details in this gist 1.

a-ungurianu commented 1 year ago

Hey!

This makes sense in theory. Could you rebase in master so the tests rerun. Would appreciate a review from someone else, as I'm not super familiar with the lifetime cycle of the client

azat commented 1 year ago

Rebased, but now it requires workflow approval

codecov[bot] commented 1 year ago

Codecov Report

Base: 94.65% // Head: 94.70% // Increases project coverage by +0.04% :tada:

Coverage data is based on head (8b660fb) compared to base (92b071d). Patch coverage: 100.00% of modified lines in pull request are covered.

:exclamation: Current head 8b660fb differs from pull request most recent head cd4a2b2. Consider uploading reports for the commit cd4a2b2 to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #688 +/- ## ========================================== + Coverage 94.65% 94.70% +0.04% ========================================== Files 57 57 Lines 8338 8340 +2 ========================================== + Hits 7892 7898 +6 + Misses 446 442 -4 ``` | [Impacted Files](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk) | Coverage Δ | | |---|---|---| | [kazoo/protocol/connection.py](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk#diff-a2F6b28vcHJvdG9jb2wvY29ubmVjdGlvbi5weQ==) | `96.07% <100.00%> (-0.62%)` | :arrow_down: | | [kazoo/tests/test\_client.py](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk#diff-a2F6b28vdGVzdHMvdGVzdF9jbGllbnQucHk=) | `98.80% <100.00%> (+<0.01%)` | :arrow_up: | | [kazoo/tests/test\_gevent\_handler.py](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk#diff-a2F6b28vdGVzdHMvdGVzdF9nZXZlbnRfaGFuZGxlci5weQ==) | `77.46% <0.00%> (-2.12%)` | :arrow_down: | | [kazoo/handlers/eventlet.py](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk#diff-a2F6b28vaGFuZGxlcnMvZXZlbnRsZXQucHk=) | `99.01% <0.00%> (-0.99%)` | :arrow_down: | | [kazoo/testing/harness.py](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk#diff-a2F6b28vdGVzdGluZy9oYXJuZXNzLnB5) | `97.52% <0.00%> (-0.83%)` | :arrow_down: | | [kazoo/handlers/utils.py](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk#diff-a2F6b28vaGFuZGxlcnMvdXRpbHMucHk=) | `94.06% <0.00%> (-0.46%)` | :arrow_down: | | [kazoo/tests/test\_utils.py](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk#diff-a2F6b28vdGVzdHMvdGVzdF91dGlscy5weQ==) | `93.33% <0.00%> (ø)` | | | [kazoo/tests/test\_\_connection.py](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk#diff-a2F6b28vdGVzdHMvdGVzdF9fY29ubmVjdGlvbi5weQ==) | `98.14% <0.00%> (ø)` | | | [kazoo/tests/test\_partitioner.py](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk#diff-a2F6b28vdGVzdHMvdGVzdF9wYXJ0aXRpb25lci5weQ==) | `98.69% <0.00%> (ø)` | | | ... and [6 more](https://codecov.io/gh/python-zk/kazoo/pull/688?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=python-zk)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

azat commented 1 year ago

Looks like all failures are unrelated, since the last error in the log is:

==> Uploading
    .url https://codecov.io/
    .query commit=d9a20a766ba942b5722ee36a1210e62fbb0ae9f8&branch=HEAD&package=py2.1.12
    Gzipping contents..
    Compressed contents to 18443 bytes
    Pinging Codecov...
Error: Could not determine repo and owner
Tip: See all example repositories: https://github.com/codecov?query=example
.pkg: _exit> python /opt/hostedtoolcache/Python/3.10.9/x64/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta
  py310-gevent-eventlet-sasl: FAIL code 1 (171.97=setup[19.90]+cmd[152.06] seconds)
  codecov: OK (14.40=setup[13.73]+cmd[0.66] seconds)
  evaluation failed :( (187.83 seconds)
Error: Process completed with exit code 255.
StephenSorriaux commented 1 year ago

Yes you are right, we have some flaky tests...

azat commented 1 year ago

So, any news/feedback on this one?

jeffwidman commented 1 year ago

@azat can you check the "allow edits from maintainers" box? I can't rebase to merge w/o that... or you can simply rebase it yourself, up to you.

azat commented 1 year ago

@jeffwidman I don't have Allow edits from mainters box due to the github issues for pull requests from organizations, and I completely forgot about this, next time will keep it in mind (maybe will do this from my main account).

Thanks!

jeffwidman commented 1 year ago

Gotcha, I didn't even realize that was a functionality gap as I always contribute from my personal GitHub... makes sense.