Open sebastiangonsal opened 5 years ago
I believe I am running into a similar issue. I start a web-server on a separate thread
in my python script, and the thread just seems to time out and will not handle requests. Minimal example:
conda create -n r-reticulate python=3.9
conda activate r-reticulate
pip install uvicorn starlette
import uvicorn
import starlette
import sys
print(f"# {sys.version=}")
print(f"# {uvicorn.__version__=}")
print(f"# {starlette.__version__=}")
# sys.version='3.9.12 (main, Jun 1 2022, 06:39:55) \n[Clang 12.0.0 ]'
# uvicorn.__version__='0.18.2'
# starlette.__version__='0.20.4'
from starlette.applications import Starlette
from starlette.routing import Route
from starlette.responses import JSONResponse
import threading
import uvicorn
async def homepage(request):
return JSONResponse({'hello': 'world'})
config = uvicorn.Config(
app=Starlette(debug=True, routes=[Route('/', homepage)]),
port=8000,
log_level="debug",
)
server = uvicorn.Server(config=config)
server_thread = threading.Thread(target=server.run, daemon=True)
reticulate::py$server_thread$start() # start server in background thread
Then navigate to http://localhost:8000
in browser.
Webpage loads with { hello: world }
JSON response.
Request just hangs and server never fulfills request. Running the app
on the main thread reticulate::py$server$run()
works just fine, so I'm thinking this is very likely an issue with how threads are handled in reticulate.
platform x86_64-apple-darwin17.0
arch x86_64
os darwin17.0
system x86_64, darwin17.0
status
major 4
minor 1.2
year 2021
month 11
day 01
svn rev 81115
language R
version.string R version 4.1.2 (2021-11-01)
nickname Bird Hippie
I'm having issues with multithreading as well.
It's possible there's an issue with Python multithreading here, and an issue with ASGI servers (like uvicorn above)....possibly related.
I'm happy to help out for a fix, but I'm not sure what's the issue at the moment.
Ideas?
CC @t-kalinowski @kevinushey
A "basic" threading example, as e.g. from here, seems to work for me:
import time
import threading
def hello():
for i in range(3):
print(f"Hello from thread! This is iteration {i}")
time.sleep(1)
thread = threading.Thread(target = hello)
thread.start()
thread.join()
So we probably need some more information in order to dive into why this particular example doesn't work as expected. Preferably, with a reproducible example depending only on threading
or other "base" Python modules.
That said, my guess here is that we need to make sure that R runs the Python event loop whenever R_ProcessEvents is called. Right now we do some work to make sure the R event loop runs when Python is busy here:
https://github.com/rstudio/reticulate/blob/main/src/event_loop.cpp
But maybe we need a similar analogue in the other direction?
Thanks for the quick response @kevinushey
I think you're right that this is more than a multithreading issue; I was incorrect.
So we probably need some more information in order to dive into why this particular example doesn't work as expected. Preferably, with a reproducible example depending only on threading or other "base" Python modules.
The above example by @manzt uses https://github.com/encode/uvicorn which (as far as I can tell) just relies on asyncio in CPython to create a Python framework for an ASGI server. That leads us back to Py_AddPendingCall
I think: https://docs.python.org/3/c-api/init.html#asynchronous-notifications
But maybe we need a similar analogue in the other direction?
This sounds promising. How....would you do that?
This sounds promising. How....would you do that?
I do not know :-) This will take some extra investigation; I'm not familiar with the internals of asyncio (or if there are extra considerations to be aware of for applications embedding Python).
Thanks for the quick response.
Here's a more contrived example which I think illustrates the issue. This program creates two files ping.txt
and pong.txt
. The main thread writes ping
to the ping.txt
file, while another thread runs an infinite loop that watches for changes and writes pong
to pong.txt
. This program runs fine with python ping-pong.py
but completely hangs in R.
import time
import threading
import pathlib
ping = pathlib.Path.cwd() / "ping.txt"
ping.touch()
should_exit = False
def listen():
pong = pathlib.Path.cwd() / "pong.txt"
# wait for file to be created
while not ping.is_file():
time.sleep(0.5)
with ping.open(mode="r") as file:
prev_contents = file.read()
count = 0
while not should_exit:
time.sleep(0.5)
with ping.open(mode="r") as file:
contents = file.read()
if prev_contents != contents:
prev_contents = contents
with pong.open(mode="a") as out:
out.write(f"{count} pong\n")
count += 1
thread = threading.Thread(target=listen)
thread.start()
for i in range(4):
with ping.open(mode="a") as file:
file.write(f"{i} ping\n")
time.sleep(3)
should_exit = True
thread.join()
I believe this is similar because uvicorn.Server.run
is blocking and runs an infinite loop on a separate thread. Under the hood, uvicorn.Server.run
calls asyncio.run
and just awaits and infinite async event loop https://github.com/encode/uvicorn/blob/33446fe375597114257ec0822b408a7e13bff20c/uvicorn/server.py#L215
Interesting... this seems to work for me on macOS with Python 3.9.13 and R 4.2 (both Anaconda and "plain" Python seem to be fine). Can you share any other system details?
This is what I see at the end:
>>> for i in range(4):
... with ping.open(mode="a") as file:
... file.write(f"{i} ping\n")
... time.sleep(3)
...
... should_exit = True
7
7
7
7
>>> thread.join()
Do you see the files on your local machine afterwards?
python --version
# Python 3.9.12
R.version
_
platform x86_64-apple-darwin17.0
arch x86_64
os darwin17.0
system x86_64, darwin17.0
status
major "print"
minor 1.2
year 2021
month 11
day 01
svn rev 81115
language R
version.string R version 4.1.2 (2021-11-01)
nickname Bird Hippie
The REPL pauses when running the script,
> reticulate::repl_python()
Python 3.9.12 (/Users/manzt/dev/miniforge3/envs/r-reticulate/bin/python3.9)
Reticulate 1.25 REPL -- A Python interpreter in R.
Enter 'exit' or 'quit' to exit the REPL and return to R.
>>> import time
>>> import threading
>>> import pathlib
>>>
>>> ping = pathlib.Path.cwd() / "ping.txt"
>>> ping.touch()
>>> should_exit = False
>>>
>>> def listen():
... pong = pathlib.Path.cwd() / "pong.txt"
...
>>> # wait for file to be created
>>> while not ping.is_file():
... time.sleep(0.5)
...
>>> with ping.open(mode="r") as file:
... prev_contents = file.read()
...
>>> count = 0
>>> while not should_exit:
... time.sleep(0.5)
... <--- # hangs here
And it's not until I press "STOP" in RStudio (or CMD + C) that the rest prints to the console:
>>> with ping.open(mode="r") as file:
... contents = file.read()
...
>>> if prev_contents != contents:
... prev_contents = contents
... with pong.open(mode="a") as out:
... out.write(f"{count} pong\n")
... count += 1
...
>>>
>>> thread = threading.Thread(target=listen)
>>> thread.start()
>>>
>>> for i in range(4):
... with ping.open(mode="a") as file:
... file.write(f"{i} ping\n")
... time.sleep(3)
...
7
7
7
7
>>> should_exit = True
>>> thread.join()
and then no files are written except for ping.txt
.
I'm not sure if it would be useful to watch the execution, but here it is....
In the video I:
1.) run the script and it hangs
2.) "STOP" the console, and then it executes the for loop and writes to ping.txt
3.) show that ping.txt
has been written, but since that part of the code executed after stopping (?) the thread, nothing is written to pong.txt
Ah, I see the problem now. For your case, the problem here is that RStudio is interpreting your code as a bunch of separate statements, rather than a single function definition. This is more apparent if you see where the green execution line in the gutter is pausing -- it's basically just in an infinite loop running while not should_exit: time.sleep(0.5)
.
I think this is a bug that needs to be fixed on the RStudio side -- I've filed https://github.com/rstudio/rstudio/issues/11665 to track this.
Ah, I see the problem now. For your case, the problem here is that RStudio is interpreting your code as a bunch of separate statements, rather than a single function definition. This is more apparent if you see where the green execution line in the gutter is pausing -- it's basically just in an infinite loop running while not should_exit: time.sleep(0.5).
Thanks for the response. Unfortunately, I think in an attempt to make a more simplified example my contrived ping-pong snippet uncovered a separate issue (which you correctly identified about RStudio). The original issue persists outside of RStudio.
background-server.py
import threading
import uvicorn
async def app(scope, _receive, send):
await send(dict(type='http.response.start', status=200))
await send(dict(type='http.response.body', body=b'Hello, world!'))
server = uvicorn.Server(config=uvicorn.Config(app=app, port=8000))
server_thread = threading.Thread(target=server.run, daemon=True)
❯ R
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> reticulate::use_python('/Users/manzt/dev/miniforge3/envs/r-reticulate/bin/python')
> reticulate::py_run_file("background-server.py")
> reticulate::py$server_thread$start()
> reticulate::py$server$started
[1] FALSE
>
> ipython
Python 3.9.12 (main, Jun 1 2022, 06:36:29)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: %run background-server.py
In [2]: server_thread.start()
In [3]: INFO: Started server process [72792]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000/ (Press CTRL+C to quit)
In [3]:
In [3]: server.started
Out[3]: True
In [4]:
The ipython terminal correctly spins up the async server in a separate thread and responses with Hello, world!
at http://localhost:8000
. The R execution never starts the server. I have been banging my head trying to come up with an example with python builtins but currently this is the most simple example I can come up with.
EDIT: I removed starlette
as a dependency in the background-server.py
snippet.
EDIT: The same issue occurs with the other most popular async Python webserver, hypercorn
. Modified background-server.py
:
Well, I can attest that @manzt's example above doesn't work for me either:
> reticulate::py_run_file("background-server.py")
> reticulate::py$server_thread$start()
> reticulate::py$server$started
[1] FALSE
>
It happens with hypercorn too.
Both uvicorn and hypercorn are pretty popular frameworks for ASGI servers.
e.g. +5K stars on https://github.com/encode/uvicorn
But maybe we need a similar analogue in the other direction?
@kevinushey Maybe this is worth a second thought....
It could be worth opening up a separate GitHub issue as well; this got a bit long (and we found tangential issues)
I think @kevinushey diagnosed this correctly, we need to let the the Python event loop run occasionally from the R side. E.g., you can see the uvicorn server thread run if you call Python sleep
from the main Python thread.
> reticulate::py_run_file("background-server.py")
> reticulate::py$server_thread$start()
> reticulate::py$server$started
[1] FALSE
> reticulate::py_run_string("from time import sleep; sleep(2)")
INFO: Started server process [62213]
INFO: Waiting for application startup.
INFO: ASGI 'lifespan' protocol appears unsupported.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
> reticulate::py$server$started
[1] TRUE
Until this is added to reticulate, you can add something like this to your R project to yield to the Python runtime on a somewhat regular schedule:
local({
reticulate::py_run_string("from time import sleep")
py_yield_and_register_next_yield <- function() {
reticulate::py_eval("sleep(0.001)")
later::later(py_yield_and_register_next_yield, .1)
invisible()
}
later::later(py_yield_and_register_next_yield)
})
If i use threads (using
threading
) in the python function called fromreticulate
or useasync
methods, those threads never get finished and all python threads just get killed after the main thread exits. What are the ways to mitigate this?