Closed link2xt closed 1 year ago
Thanks for providing the logs and the link to the project sources.
Unfortunately, I cannot reproduce the error locally. After building deltachat-rpc-server I was able to run tox run -- -k test_system_info
successfully multiple times on Python 3.7 and Python 3.11.
From the top of my head there are two things you can do. If the error occurs with Python 3.7 only, you may consider testing with Python 3.8. Major parts of asyncio have been rewritten in Python 3.8 and Python 3.7 will reach end-of-life in roughly 6 weeks.
Alternatively, you can try to isolate the error in a simpler example. Your setup is quite complex and involves building executables and creating online accounts to run the tests. If you can create a minimal example that reproduces the issue, chances are much higher that we can help you.
I also cannot reproduce it locally, it happens randomly on GitHub Actions. Maybe GitHub runner pauses the processes or otherwise triggers race conditions that do not happen when you run the tests on a dev machine.
I mostly hoped that something can be figured out of the backtrace.
This seems to be somehow related to having a subprocess, and deltachat-rpc-client Python library indeed starts a deltachat-rpc-server Rust binary as a subprocess.
The backtrace in a better form:
_UnixDefaultEventLoopPolicy.set_event_loop
, call self._watcher.attach_loop(loop)
BaseChildWatcher.attach_loop()
, call self._do_waitpid_all()
SafeChildWatcher._do_waitpid_all()
, call self._do_waitpid(pid)
SafeChildWatcher._do_waitpid()
, call callback(pid, returncode, *args)
_UnixSelectorEventLoop._child_watcher_callback()
, call self.call_soon_threadsafe(transp._process_exited, returncode)
BaseEventLoop.call_soon_threadsafe()
: self._check_closed()So attach_loop of the child watcher is called here: https://github.com/python/cpython/blob/v3.7.16/Lib/asyncio/unix_events.py#L1134
Then new loop is set and _do_waitpid_all
is called: https://github.com/python/cpython/blob/v3.7.16/Lib/asyncio/unix_events.py#L887-L889
Note the comment about the race condition.
Apparently this race condition happens and a child indeed terminated but was not handled while previous loop was set. So we are handling it now, right after switching to a new loop.
A child handler callback is called, which is actually a _child_watcher_callback
. This callback only calls self.call_soon_threadsafe
which expects the current loop to be open. And it indeed should be open because we have just created and set it. I don't understand what is wrong there.
There is indeed something different about how setting event loop is handled in Python 3.7. I have created an example:
import subprocess
import asyncio
import time
import signal
async def main():
policy = asyncio.get_event_loop_policy()
new_loop = policy.new_event_loop()
subprocess = await asyncio.create_subprocess_exec("sleep", "5")
policy.set_event_loop(new_loop)
if __name__ == "__main__":
asyncio.run(main())
This produces a warning "RuntimeWarning: A loop is being detached from a child watcher with pending handlers" with Python 3.7, but not with Python 3.8.
The warning about loop being detached is produced when the loop is set to None
:
https://github.com/python/cpython/blob/v3.7.16/Lib/asyncio/unix_events.py#L874-L878
So somehow with Python 3.7 policy.set_event_loop()
is setting the loop
to None
in this case. This is different from the bug observed in the log though, where the loop is not None
, but closed. Maybe old loop is used for some reason.
One difference I see comparing 3.8.0 to 3.7.16 is that default child watcher is changed to ThreadedChildWatcher
. As of 3.11 it is still the default child watcher: https://docs.python.org/3.11/library/asyncio-policy.html
The change was introduced in https://github.com/python/cpython/pull/14344
It is indeed a big difference, I will try to replace Python 3.7 with Python 3.8 as the minimum supported version and see if this error appears.
As for pytest-asyncio, maybe it is possible to somehow avoid setting global/thread-local variables and make it less stateful to avoid such problems on the architectural level? Just a thought, I did not even read the whole asyncio
documentation.
I am going to drop support for Python 3.7 at https://github.com/deltachat/deltachat-core-rust/pull/4410
If getting rid of set_event_loop()
call is not feasible, feel free to close the issue.
Thanks for the thorough investigation and summary!
This sounds like a duplicate of #114. The issue was closed a couple of months ago, because it's unlikely that anyone is going to backport the asyncio.ThreadedChildWatcher to Python 3.7.
As for pytest-asyncio, maybe it is possible to somehow avoid setting global/thread-local variables and make it less stateful to avoid such problems on the architectural level? Just a thought, I did not even read the whole asyncio documentation.
If getting rid of set_event_loop() call is not feasible, feel free to close the issue.
Right now, pytest-asyncio uses _get_eventloop() during setup of the _eventloop fixture. Unfortunately, _get_eventloop creates a new loop when no loop has been set or returns the old loop when _set_eventloop was called. The goal is to get rid of both _get_eventloop and _set_eventloop calls, especially due to recent changes to get_event_loop. [0]
The problem is that pytest-asyncio currently uses _get_event_loop to clean up unclosed event loops for the users. Technically, this is "bad loop hygene", but we cannot just remove the functionality without warning. Therefore, pytest-asyncio emits a DeprecationWarning since v0.21 when it closes an event loop for the user. The plan is to remove that functionality in future versions of pytest-asyncio.
I'll close this as a duplicate of #114. Feel free to continue the discussion, though.
The error sometimes happens during the test teardown.
Here is an example of a failure: https://github.com/deltachat/deltachat-core-rust/actions/runs/5022838715/jobs/9006825246
Slightly cleaned up (removed
git clone
progress) log:Failure log
``` 2023-05-19T09:30:28.1963672Z Requested labels: ubuntu-latest 2023-05-19T09:30:28.1964098Z Job defined at: deltachat/deltachat-core-rust/.github/workflows/ci.yml@refs/heads/master 2023-05-19T09:30:28.1964303Z Waiting for a runner to pick up this job... 2023-05-19T09:30:28.4092092Z Job is waiting for a hosted runner to come online. 2023-05-19T09:30:32.4334700Z Job is about to start running on the hosted runner: GitHub Actions 12 (hosted) 2023-05-19T09:30:35.3312448Z Current runner version: '2.304.0' 2023-05-19T09:30:35.3347520Z ##[group]Operating System 2023-05-19T09:30:35.3348307Z Ubuntu 2023-05-19T09:30:35.3348674Z 22.04.2 2023-05-19T09:30:35.3349170Z LTS 2023-05-19T09:30:35.3349597Z ##[endgroup] 2023-05-19T09:30:35.3350020Z ##[group]Runner Image 2023-05-19T09:30:35.3350496Z Image: ubuntu-22.04 2023-05-19T09:30:35.3350915Z Version: 20230507.1 2023-05-19T09:30:35.3351980Z Included Software: https://github.com/actions/runner-images/blob/ubuntu22/20230507.1/images/linux/Ubuntu2204-Readme.md 2023-05-19T09:30:35.3352827Z Image Release: https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20230507.1 2023-05-19T09:30:35.3353982Z ##[endgroup] 2023-05-19T09:30:35.3354554Z ##[group]Runner Image Provisioner 2023-05-19T09:30:35.3355023Z 2.0.171.1 2023-05-19T09:30:35.3355372Z ##[endgroup] 2023-05-19T09:30:35.3357299Z ##[group]GITHUB_TOKEN Permissions 2023-05-19T09:30:35.3358333Z Actions: write 2023-05-19T09:30:35.3358772Z Checks: write 2023-05-19T09:30:35.3359528Z Contents: write 2023-05-19T09:30:35.3360103Z Deployments: write 2023-05-19T09:30:35.3360573Z Discussions: write 2023-05-19T09:30:35.3361006Z Issues: write 2023-05-19T09:30:35.3361408Z Metadata: read 2023-05-19T09:30:35.3361871Z Packages: write 2023-05-19T09:30:35.3362228Z Pages: write 2023-05-19T09:30:35.3362659Z PullRequests: write 2023-05-19T09:30:35.3363121Z RepositoryProjects: write 2023-05-19T09:30:35.3363653Z SecurityEvents: write 2023-05-19T09:30:35.3364091Z Statuses: write 2023-05-19T09:30:35.3364471Z ##[endgroup] 2023-05-19T09:30:35.3369362Z Secret source: Actions 2023-05-19T09:30:35.3370021Z Prepare workflow directory 2023-05-19T09:30:35.4547198Z Prepare all required actions 2023-05-19T09:30:35.4808637Z Getting action download info 2023-05-19T09:30:35.7249563Z Download action repository 'actions/checkout@v3' (SHA:8e5e7e5ab8b370d6c329ec480221332ada57f0ab) 2023-05-19T09:30:36.5159282Z Download action repository 'actions/setup-python@v4' (SHA:57ded4d7d5e986d7296eab16560982c6dd7c923b) 2023-05-19T09:30:37.1693596Z Download action repository 'actions/download-artifact@v3' (SHA:9bc31d5ccc31df68ecc42ccf4149144866c47d8a) 2023-05-19T09:30:37.8234314Z Complete job name: Async Python tests (ubuntu-latest, 3.7) 2023-05-19T09:30:37.9510800Z ##[group]Run actions/checkout@v3 2023-05-19T09:30:37.9511610Z with: 2023-05-19T09:30:37.9511972Z repository: deltachat/deltachat-core-rust 2023-05-19T09:30:37.9512906Z token: *** 2023-05-19T09:30:37.9513264Z ssh-strict: true 2023-05-19T09:30:37.9513743Z persist-credentials: true 2023-05-19T09:30:37.9514119Z clean: true 2023-05-19T09:30:37.9514571Z fetch-depth: 1 2023-05-19T09:30:37.9514847Z lfs: false 2023-05-19T09:30:37.9515170Z submodules: false 2023-05-19T09:30:37.9515580Z set-safe-directory: true 2023-05-19T09:30:37.9515948Z env: 2023-05-19T09:30:37.9516336Z RUSTFLAGS: -Dwarnings 2023-05-19T09:30:37.9516620Z ##[endgroup] 2023-05-19T09:30:38.3231535Z Syncing repository: deltachat/deltachat-core-rust 2023-05-19T09:30:38.3233732Z ##[group]Getting Git version info 2023-05-19T09:30:38.3234399Z Working directory is '/home/runner/work/deltachat-core-rust/deltachat-core-rust' 2023-05-19T09:30:38.3235058Z [command]/usr/bin/git version 2023-05-19T09:30:38.3380949Z git version 2.40.1 2023-05-19T09:30:38.3422401Z ##[endgroup] 2023-05-19T09:30:38.3452555Z Temporarily overriding HOME='/home/runner/work/_temp/391bc45e-d9af-41b1-942e-e25cfd34c11b' before making global git config changes 2023-05-19T09:30:38.3453724Z Adding repository directory to the temporary git global config as a safe directory 2023-05-19T09:30:38.3454982Z [command]/usr/bin/git config --global --add safe.directory /home/runner/work/deltachat-core-rust/deltachat-core-rust 2023-05-19T09:30:38.3522544Z Deleting the contents of '/home/runner/work/deltachat-core-rust/deltachat-core-rust' 2023-05-19T09:30:38.3527848Z ##[group]Initializing the repository 2023-05-19T09:30:38.3534027Z [command]/usr/bin/git init /home/runner/work/deltachat-core-rust/deltachat-core-rust 2023-05-19T09:30:38.3712980Z hint: Using 'master' as the name for the initial branch. This default branch name 2023-05-19T09:30:38.3713911Z hint: is subject to change. To configure the initial branch name to use in all 2023-05-19T09:30:38.3715558Z hint: of your new repositories, which will suppress this warning, call: 2023-05-19T09:30:38.3715990Z hint: 2023-05-19T09:30:38.3716460Z hint: git config --global init.defaultBranch