mdklatt / httpexec

Execute CLI commands over HTTP via a REST API.
MIT License
0 stars 0 forks source link

First request to new instance sometimes fails #1

Closed mdklatt closed 1 year ago

mdklatt commented 1 year ago

A downstream app is experiencing an issue where the first request to a new httpexec instance sometimes fails. The httpexec app is being served by hypercorn inside a Docker container. This issue cannot be duplicated here in testing, so it might be a downstream problem .

[2023-02-14 16:59:51,759] ERROR in app: Exception on request POST /app
Traceback (most recent call last):
  File "/opt/app/lib/python3.9/site-packages/quart/app.py", line 1489, in handle_request
    return await self.full_dispatch_request(request_context)
  File "/opt/app/lib/python3.9/site-packages/quart/app.py", line 1507, in full_dispatch_request
    await self.try_trigger_before_first_request_functions()
  File "/opt/app/lib/python3.9/site-packages/quart/app.py", line 1420, in try_trigger_before_first_request_functions
    async with self._first_request_lock:
  File "/usr/lib/python3.9/asyncio/locks.py", line 14, in __aenter__
    await self.acquire()
  File "/usr/lib/python3.9/asyncio/locks.py", line 120, in acquire
    await fut
RuntimeError: Task <Task pending name='Task-16' coro=<ASGIHTTPConnection.handle_request() running at /opt/app/lib/python3.9/site-packages/quart/asgi.py:90> cb=[_wait.<locals>._on_completion() at /usr/lib/python3.9/asyncio/tasks.py:509]> got Future <Future pending> attached to a different loop
[2023-02-14 16:59:51 +0000] [8] [INFO] 172.31.0.3:41390 - - [14/Feb/2023:16:59:51 +0000] "POST /app1.1" 500 265 "-" "Python/3.9 aiohttp/3.8.4"
mdklatt commented 1 year ago

The stack trace shows that the error is originating with try_trigger_before_first_request_functions(), which is called the config() function here. A problem with a before_first_request handler would explain why the bug is only observed with the first request to a new instance. But why does it only happen sometimes? Also, the config() function is ultimately successful, because subsequent requests always succeed.

@app.before_first_request
def config():
    """ Get configuration settings.

    """
    # Environment variables take precedence over the config file. Config file
    # keys must be ALL CAPS and at the root level.
    file = Path(environ.get("HTTPEXEC_CONFIG_PATH",  "etc/config.toml"))
    app.config.from_file(Path.cwd() / file, load)
    app.config.from_prefixed_env("HTTPEXEC")
    return
mdklatt commented 1 year ago

Cannot find any relevant quart or hypercorn issues, but the "Future attached to a different loop" error message is suggestive of an inconsistent async usage.

mdklatt commented 1 year ago

Here the config() function is not async. The before_first_request() docs say,

if used to decorate a synchronous function, the function will be wrapped in [run_sync()] (https://quart.palletsprojects.com/en/latest/reference/source/quart.utils.html#quart.utils.run_sync) and run in a thread executor (with the wrapped function returned).

So, this function should not have to be marked as async, but this run_sync() adapter may not play well with hypercorn in production. Also, cannot rule this out as a Docker issue, but that seems unlikely.

mdklatt commented 1 year ago

In downstream testing, marking config() as async eliminates the coroutine errors. Without being able to duplicate the issue here, this is only circumstantial evidence that async is necessary. However, given that there is no reason not to use async, go ahead with this change.

mdklatt commented 1 year ago

There's also a before_serving handler. How is this different than before_first_request? Is this more appropriate to use for config()?

mdklatt commented 1 year ago

Replacing @before_first_request with @before_serving causes unit tests to fail. This may just be an artifact of the way the test environment is defined, but using async config() with @before_first_request is known to work here in testing, and seems to fix the downstream app issues, so just use that.

mdklatt commented 1 year ago

Testing with the affected downstream applications is satisfactory. Issue resolved by commit 5e4ef80cbca97d510fbb69d47b39288b827c8fe2.