rstudio / reticulate

R Interface to Python
https://rstudio.github.io/reticulate
Apache License 2.0
1.67k stars 327 forks source link

calling python functions that use threading or async #515

Open sebastiangonsal opened 5 years ago

sebastiangonsal commented 5 years ago

If i use threads (using threading) in the python function called from reticulate or use async methods, those threads never get finished and all python threads just get killed after the main thread exits. What are the ways to mitigate this?

manzt commented 2 years ago

I believe I am running into a similar issue. I start a web-server on a separate thread in my python script, and the thread just seems to time out and will not handle requests. Minimal example:

Setup

conda create -n r-reticulate python=3.9
conda activate r-reticulate
pip install uvicorn starlette

Example.Rmd

import uvicorn
import starlette
import sys

print(f"# {sys.version=}")
print(f"# {uvicorn.__version__=}")
print(f"# {starlette.__version__=}")

# sys.version='3.9.12 (main, Jun  1 2022, 06:39:55) \n[Clang 12.0.0 ]'
# uvicorn.__version__='0.18.2'
# starlette.__version__='0.20.4'
from starlette.applications import Starlette
from starlette.routing import Route
from starlette.responses import JSONResponse
import threading
import uvicorn

async def homepage(request):
    return JSONResponse({'hello': 'world'})

config = uvicorn.Config(
  app=Starlette(debug=True, routes=[Route('/', homepage)]),
  port=8000,
  log_level="debug",
)

server = uvicorn.Server(config=config)
server_thread = threading.Thread(target=server.run, daemon=True)
reticulate::py$server_thread$start() # start server in background thread

Then navigate to http://localhost:8000 in browser.

Expected behavior

Webpage loads with { hello: world } JSON response.

Actual behavior

Request just hangs and server never fulfills request. Running the app on the main thread reticulate::py$server$run() works just fine, so I'm thinking this is very likely an issue with how threads are handled in reticulate.

R.version

platform       x86_64-apple-darwin17.0     
arch           x86_64                      
os             darwin17.0                  
system         x86_64, darwin17.0          
status                                     
major          4                           
minor          1.2                         
year           2021                        
month          11                          
day            01                          
svn rev        81115                       
language       R                           
version.string R version 4.1.2 (2021-11-01)
nickname       Bird Hippie    
evanbiederstedt commented 2 years ago

I'm having issues with multithreading as well.

It's possible there's an issue with Python multithreading here, and an issue with ASGI servers (like uvicorn above)....possibly related.

I'm happy to help out for a fix, but I'm not sure what's the issue at the moment.

Ideas?

CC @t-kalinowski @kevinushey

kevinushey commented 2 years ago

A "basic" threading example, as e.g. from here, seems to work for me:

import time
import threading

def hello():
   for i in range(3):
      print(f"Hello from thread! This is iteration {i}")
      time.sleep(1)

thread = threading.Thread(target = hello)      
thread.start()
thread.join()

So we probably need some more information in order to dive into why this particular example doesn't work as expected. Preferably, with a reproducible example depending only on threading or other "base" Python modules.

kevinushey commented 2 years ago

That said, my guess here is that we need to make sure that R runs the Python event loop whenever R_ProcessEvents is called. Right now we do some work to make sure the R event loop runs when Python is busy here:

https://github.com/rstudio/reticulate/blob/main/src/event_loop.cpp

But maybe we need a similar analogue in the other direction?

evanbiederstedt commented 2 years ago

Thanks for the quick response @kevinushey

I think you're right that this is more than a multithreading issue; I was incorrect.

So we probably need some more information in order to dive into why this particular example doesn't work as expected. Preferably, with a reproducible example depending only on threading or other "base" Python modules.

The above example by @manzt uses https://github.com/encode/uvicorn which (as far as I can tell) just relies on asyncio in CPython to create a Python framework for an ASGI server. That leads us back to Py_AddPendingCall I think: https://docs.python.org/3/c-api/init.html#asynchronous-notifications

But maybe we need a similar analogue in the other direction?

This sounds promising. How....would you do that?

kevinushey commented 2 years ago

This sounds promising. How....would you do that?

I do not know :-) This will take some extra investigation; I'm not familiar with the internals of asyncio (or if there are extra considerations to be aware of for applications embedding Python).

manzt commented 2 years ago

Thanks for the quick response.

Here's a more contrived example which I think illustrates the issue. This program creates two files ping.txt and pong.txt. The main thread writes ping to the ping.txt file, while another thread runs an infinite loop that watches for changes and writes pong to pong.txt. This program runs fine with python ping-pong.py but completely hangs in R.

import time
import threading
import pathlib

ping = pathlib.Path.cwd() / "ping.txt"
ping.touch()
should_exit = False

def listen():
    pong = pathlib.Path.cwd() / "pong.txt"

    # wait for file to be created
    while not ping.is_file():
        time.sleep(0.5)

    with ping.open(mode="r") as file:
        prev_contents = file.read()

    count = 0
    while not should_exit:
        time.sleep(0.5)

        with ping.open(mode="r") as file:
            contents = file.read()

        if prev_contents != contents:
            prev_contents = contents
            with pong.open(mode="a") as out:
                out.write(f"{count} pong\n")
            count += 1

thread = threading.Thread(target=listen)
thread.start()

for i in range(4):
    with ping.open(mode="a") as file:
        file.write(f"{i} ping\n")
        time.sleep(3)

should_exit = True
thread.join()

I believe this is similar because uvicorn.Server.run is blocking and runs an infinite loop on a separate thread. Under the hood, uvicorn.Server.run calls asyncio.run and just awaits and infinite async event loop https://github.com/encode/uvicorn/blob/33446fe375597114257ec0822b408a7e13bff20c/uvicorn/server.py#L215

kevinushey commented 2 years ago

Interesting... this seems to work for me on macOS with Python 3.9.13 and R 4.2 (both Anaconda and "plain" Python seem to be fine). Can you share any other system details?

This is what I see at the end:

>>> for i in range(4):
...     with ping.open(mode="a") as file:
...         file.write(f"{i} ping\n")
...         time.sleep(3)
...         
... should_exit = True
7
7
7
7
>>> thread.join()
manzt commented 2 years ago

Do you see the files on your local machine afterwards?

info (macOS 12.3.1)

python --version
# Python 3.9.12
R.version
               _                           
platform       x86_64-apple-darwin17.0     
arch           x86_64                      
os             darwin17.0                  
system         x86_64, darwin17.0          
status                                     
major          "print"
minor          1.2                         
year           2021                        
month          11                          
day            01                          
svn rev        81115                       
language       R                           
version.string R version 4.1.2 (2021-11-01)
nickname       Bird Hippie  

REPL execution

The REPL pauses when running the script,

> reticulate::repl_python()
Python 3.9.12 (/Users/manzt/dev/miniforge3/envs/r-reticulate/bin/python3.9)
Reticulate 1.25 REPL -- A Python interpreter in R.
Enter 'exit' or 'quit' to exit the REPL and return to R.
>>> import time
>>> import threading
>>> import pathlib
>>> 
>>> ping = pathlib.Path.cwd() / "ping.txt"
>>> ping.touch()
>>> should_exit = False
>>> 
>>> def listen():
...     pong = pathlib.Path.cwd() / "pong.txt"
... 
>>>     # wait for file to be created
>>>     while not ping.is_file():
...         time.sleep(0.5)
... 
>>>     with ping.open(mode="r") as file:
...         prev_contents = file.read()
... 
>>>     count = 0
>>>     while not should_exit:
...         time.sleep(0.5)
...  <--- # hangs here

And it's not until I press "STOP" in RStudio (or CMD + C) that the rest prints to the console:

>>>         with ping.open(mode="r") as file:
...             contents = file.read()
... 
>>>         if prev_contents != contents:
...             prev_contents = contents
...             with pong.open(mode="a") as out:
...                 out.write(f"{count} pong\n")
...             count += 1
... 
>>> 
>>> thread = threading.Thread(target=listen)
>>> thread.start()
>>> 
>>> for i in range(4):
...     with ping.open(mode="a") as file:
...         file.write(f"{i} ping\n")
...         time.sleep(3)
... 
7
7
7
7
>>> should_exit = True
>>> thread.join()

and then no files are written except for ping.txt.

manzt commented 2 years ago

I'm not sure if it would be useful to watch the execution, but here it is....

In the video I:

1.) run the script and it hangs 2.) "STOP" the console, and then it executes the for loop and writes to ping.txt 3.) show that ping.txt has been written, but since that part of the code executed after stopping (?) the thread, nothing is written to pong.txt

https://user-images.githubusercontent.com/24403730/180868613-e49de54e-bc11-4664-a2fe-4b27615f1fc4.mov

kevinushey commented 2 years ago

Ah, I see the problem now. For your case, the problem here is that RStudio is interpreting your code as a bunch of separate statements, rather than a single function definition. This is more apparent if you see where the green execution line in the gutter is pausing -- it's basically just in an infinite loop running while not should_exit: time.sleep(0.5).

I think this is a bug that needs to be fixed on the RStudio side -- I've filed https://github.com/rstudio/rstudio/issues/11665 to track this.

manzt commented 2 years ago

Ah, I see the problem now. For your case, the problem here is that RStudio is interpreting your code as a bunch of separate statements, rather than a single function definition. This is more apparent if you see where the green execution line in the gutter is pausing -- it's basically just in an infinite loop running while not should_exit: time.sleep(0.5).

Thanks for the response. Unfortunately, I think in an attempt to make a more simplified example my contrived ping-pong snippet uncovered a separate issue (which you correctly identified about RStudio). The original issue persists outside of RStudio.

background-server.py

import threading
import uvicorn

async def app(scope, _receive, send):
    await send(dict(type='http.response.start', status=200))
    await send(dict(type='http.response.body', body=b'Hello, world!'))

server = uvicorn.Server(config=uvicorn.Config(app=app, port=8000))
server_thread = threading.Thread(target=server.run, daemon=True)

R REPL

❯ R
R version 4.1.2 (2021-11-01) -- "Bird Hippie"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> reticulate::use_python('/Users/manzt/dev/miniforge3/envs/r-reticulate/bin/python')
> reticulate::py_run_file("background-server.py")
> reticulate::py$server_thread$start()
> reticulate::py$server$started
[1] FALSE
>

IPython

> ipython
Python 3.9.12 (main, Jun  1 2022, 06:36:29)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: %run background-server.py

In [2]: server_thread.start()

In [3]: INFO:     Started server process [72792]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000/ (Press CTRL+C to quit)
In [3]:

In [3]: server.started
Out[3]: True

In [4]:

The ipython terminal correctly spins up the async server in a separate thread and responses with Hello, world! at http://localhost:8000. The R execution never starts the server. I have been banging my head trying to come up with an example with python builtins but currently this is the most simple example I can come up with.

EDIT: I removed starlette as a dependency in the background-server.py snippet. EDIT: The same issue occurs with the other most popular async Python webserver, hypercorn. Modified background-server.py:

```python import asyncio import threading from hypercorn.config import Config from hypercorn.asyncio import serve async def app(scope, _receive, send): assert scope["type"] == "http" await send(dict(type="http.response.start", status=200)) await send(dict(type="http.response.body", body=b"Hello, world!")) def run(): loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) config = Config() config.bind = [f"localhost:8000"] # As of Hypercorn 0.11.0, need to explicitly set signal handlers to a no-op # (otherwise it will try to set signal handlers assuming it is on the main thread which throws an error) loop.run_until_complete( serve(app, config, shutdown_trigger=lambda: asyncio.Future()) ) loop.close() server_thread = threading.Thread(target=run, daemon=True) ```
evanbiederstedt commented 2 years ago

Well, I can attest that @manzt's example above doesn't work for me either:

> reticulate::py_run_file("background-server.py")
> reticulate::py$server_thread$start()
> reticulate::py$server$started
[1] FALSE
> 

It happens with hypercorn too.

Both uvicorn and hypercorn are pretty popular frameworks for ASGI servers.

e.g. +5K stars on https://github.com/encode/uvicorn

But maybe we need a similar analogue in the other direction?

@kevinushey Maybe this is worth a second thought....

It could be worth opening up a separate GitHub issue as well; this got a bit long (and we found tangential issues)

t-kalinowski commented 2 years ago

I think @kevinushey diagnosed this correctly, we need to let the the Python event loop run occasionally from the R side. E.g., you can see the uvicorn server thread run if you call Python sleep from the main Python thread.

> reticulate::py_run_file("background-server.py")
> reticulate::py$server_thread$start()
> reticulate::py$server$started
[1] FALSE
> reticulate::py_run_string("from time import sleep; sleep(2)")
INFO:     Started server process [62213]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
> reticulate::py$server$started
[1] TRUE
t-kalinowski commented 2 years ago

Until this is added to reticulate, you can add something like this to your R project to yield to the Python runtime on a somewhat regular schedule:

local({
  reticulate::py_run_string("from time import sleep")
  py_yield_and_register_next_yield <- function() {
    reticulate::py_eval("sleep(0.001)")
    later::later(py_yield_and_register_next_yield, .1)
    invisible()
  }
  later::later(py_yield_and_register_next_yield)
})