Cosider making the API end-to-end async

rafaelcaricio commented 3 years ago

Describe the bug, issue or concern

My concern is that the app is developed using FastAPI with async operations, but external requests are made syncronous. The API would perform better if the end-to-end request would be async.

To Reproduce

Example code snippet where there is a synchronous external request that blocks the eventloop:

https://github.com/minvws/nl-covid19-coronacheck-backend-bizrules-signing-service/blob/main/api/app.py#L132-L136

If one follows the code path of that API handler, we can find external requests made in synchronous mode. Reference:

https://github.com/minvws/nl-covid19-coronacheck-backend-bizrules-signing-service/blob/a93ba6bf1783d01d88be26b1d3f9af4b048f1515/api/http_utils.py#L94

Expected behavior

Make all external API requests also in async mode. Maybe consider not using requests (provides sync API only) and use httpx or aiohttp.

Additional context

Let me know if this is something you would like to explore. I would be willing to send a PR with code changes related to this issue.

Governance

[x] I've read the contributing document https://github.com/minvws/.github/blob/master/CONTRIBUTING.md
[x] I've read and understand the Code of Conduct https://github.com/minvws/.github/blob/master/CODE_OF_CONDUCT.md
[x] I understand that any contributions or suggestions I made may make it into the actual code. I've read the License https://github.com/minvws/nl-covid19-notification-app-coordination/blob/master/LICENSE.txt and the contributor license agreement https://github.com/minvws/nl-covid19-notification-app-coordination/blob/master/CLA.md

wasperen commented 3 years ago

Thanks for observing this. We think this is a great idea and may, under certain circumstances, help in boosting performance. For now, the team is focused on regular maintenance and some additional features. That particular set of calls in your example is a good candidate.

Some further background:w The actual signing is done using dedicated signing software and equipment further down the chain and however sweet they are, they tend to be the bottleneck. With the horsepower we currently have for the back-end logic you refer to, we are not certain this would help.

That is because both signatures need to be in before we can return from this function - so we need to wait one way or the other. At the moment, we are uncertain if waiting on both responses in parallel would make a big impact as the specialized signing infrastructure will hold us up. Sending more requests to an already throttling set of endpoints will not necessarily make them faster. If we can somehow release more power at that point, and they are no longer throttling, then it make perfect sense to parallelize these two calls by async-ing them.

This one on our issue list since May21 (as Inge-4 issue nr 30) and will certainly investigate if this would help - as soon as the storm calms. Sending example code is always welcome!

rafaelcaricio commented 3 years ago

Hi @wasperen,

Thank you for the reply and the background information on how the process works down the line. I think there is some misunderstanding on what is the problem I'm trying to point out in this ticket. I will try to explain more clearly.

Making the changes I proposed will not parallelize the calls to the equipment, the calls for the same request will still be sequential, but those external calls will not block other requests (in other parts of the application that there is nothing to do with signing) that are coming into the same uvicorn worker. Whenever a request is executing to the dedicated signing software and equipment all other requests for the uvicorn worker stops or cannot be handled, because the eventloop is blocked on that sync call.

I've put together a very much simplified example to illustrate what I'm talking about.

The requirements.txt file for this example, with the pinned exact versions used by the current nl-covid19-coronacheck-backend-bizrules-signing-service main branch.

# file requirements.txt
fastapi[all]==0.65.1
uvicorn[standard]==0.13.4
requests==2.25.1
httpx

Let's look into this code example:

# file app/application.py
from fastapi import FastAPI
import requests
import httpx
import uuid

app = FastAPI()

EXTERNAL_API_ENDPOINT = "http://localhost:8888"

@app.get("/healthcheck")
async def healthcheck():
    return "OK"

#
# Async mixed with blocking
#

def internal_signing_op(op_num: int, request_id: str) -> None:
    session = requests.Session()
    response = session.request("GET", EXTERNAL_API_ENDPOINT, timeout=2000)
    print(f"{request_id} {op_num}: {response}")

def sign_op1(request_id: str) -> None:
    internal_signing_op(1, request_id)

def sign_op2(request_id: str) -> None:
    internal_signing_op(2, request_id)

@app.get("/async-blocking")
async def root():
    request_id = str(uuid.uuid4())

    print(f"{request_id}: started processing")

    sign_op1(request_id)
    sign_op2(request_id)

    print(f"{request_id}: finished!")
    return {"message": "hello world"}

Here we have a simple application that tries to replicate the current behavior that I'm trying to point out. We have mixed async code with the synchronous library requests. The code works fine, but there is one problem. In order to understand the problem we need to recap on how uvicorn works. I'm sorry if you already knows this, but I just want to make sure we all have the same picture in our minds. Uvicorn executes our application server by spawning workers (OS sub-process) that handles the requests coming in to our server. Every worker (sub-process) is a fully featured CPython instance and has it's own I/O loop that runs our FastAPI application.

The Main Process holds a socket that is shared with the workers and accepts the HTTP requests that are handled by the workers to actually process the request. We can have as many workers as we want, usually the number of CPU cores. In our case, to make it easier to analyse the behavior, we are going to run only a single worker. We execute our server with the following command:

uvicorn app.application:app --workers 1

I've setup a fake external API that we will use for this example. Just a simple server that takes long to execute some obscure operation (time.sleep() 😄 ).

# file external_api.py
from fastapi import FastAPI
import time

app = FastAPI()

@app.get("/")
async def root():
    time.sleep(20)
    return "OK"

We spin up the external API server using this command:

uvicorn external_api:app --port 8888 --workers 5

We set 5 workers here for no good reason, the important part here is to make the external API run in the port 8888 which is the one we hardcoded in our example application.

Full work tree of the example for reference:

.
├── app
│   ├── __init__.py
│   └── application.py
├── external_api.py
└── requirements.txt

1 directory, 4 files

Now we can call our application with mixed async and sync code and observe what is printed out. I used httpie to make the requests. I've opened two consoles and make to distinct HTTP requests to our application within the 20 seconds timeframe. This is the output:

As we can observe in the output that even tough I made both requests "in parallel" (same second) the server only accepted the request/started processing the second request (5b48b300-964a-4db8-9d72-7b41f52b9c16) after a full execution of the first request (63563037-804d-4391-a363-35babaa8db90) which took a full 40 seconds. During the whole 40 seconds there was no task switching and the worker event loop was completely blocked. All requests to the API are stale for the full 40 seconds, including requests to any other endpoints that might exist in other parts of the application. Even if the other requests don't call the external API, they cannot execute because the worker event loop is blocked. If we call the /healthcheck endpoint it will not execute either.

One way to hide this problem and have our server still accepting multiple requests when the workers are blocked is to increase the number of workers. But those new workers can also be blocked on sync calls. The better way to improve throughput in the API overall and not let our workers be blocked, is to implement async end to end in our application. Let's now implement a v2 version of our example API, still calling the same fake external API that takes 20 seconds to process. Also we will again run uvicorn with a single worker. Here is the code:

# file app/application.py
#
# Async end-to-end
#

async def v2_internal_signing_op(op_num: int, request_id: str) -> None:
    async with httpx.AsyncClient() as session:
        response = await session.request("GET", EXTERNAL_API_ENDPOINT, timeout=2000)
    print(f"{request_id} {op_num}: {response}")

async def v2_sign_op1(request_id: str) -> None:
    await v2_internal_signing_op(1, request_id)

async def v2_sign_op2(request_id: str) -> None:
    await v2_internal_signing_op(2, request_id)

@app.get("/all-async")
async def v2_root():
    request_id = str(uuid.uuid4())

    print(f"{request_id}: started processing")

    await v2_sign_op1(request_id)
    await v2_sign_op2(request_id)

    print(f"{request_id}: finished!")
    return {"message": "hello world"}

Notice that I've replaced the requests library by the httpx library which supports async HTTP calls and has an API that is very similar to the one requests provide. The code is functionally equivalent to our previous mixed implementation, but now we implemented async fully. Let's execute our API using the same commands as before.

uvicorn app.application:app --workers 1

Then calling the API using httpie, but to the fully async endpoint:

http localhost:8000/all-async

The output is:

We can observe in the output that both requests started processing immediately and they are still sequential in their own request lifecycle. The eventloop of the uvicorn worker is not blocked, that is why the second request could continue processing even though the external API did not finish its operation. Other requests, like the /healthcheck, are not impacted by the slow execution of the external API. Overall our application continues to server other request that don't depend on the external API.

I'm not suggesting parallel processing, but letting execution of the application be more concurrent by not blocking the I/O loop. The overall throughput and latency of the application will be better as more unrelated requests can be processed by the same uvicorn worker.

Let me know if you think I'm wrong or if you find this irrelevant. I hope that could illustrate better what I'm trying to suggest here.

wasperen commented 3 years ago

completely understood and thank you very much for this elaborate example! I have linked this issue to our internal issue least where async is listed as a possible enhancement.

minvws / nl-covid19-coronacheck-backend-bizrules-signing-service