Bug: LoggingMiddleware stores all responses permanently in memory (memory leak)

chris-telemetry commented 1 year ago

I've been testing out the LoggingMiddleware and I believe it's holding the response data in memory. This is causing very high memory usage for some of our endpoints that return very large (~10-20MB) responses. This happens even when I set response_log_fields=('status_code').

I'm using using starlite = {extras = ["full"], version = "1.50.2"} so the logger should be using picologging.

Here's a snippet for you, the json file I'm using to test with is about 20MB.

import json
from starlite import Starlite, post, LoggingConfig
from starlite.middleware import LoggingMiddlewareConfig
import uvicorn
from typing import  Any
@post('/test')
async def test_endpt(data: Any) -> Any:
    with open("/Users/christophermoyer/code/scratch/test.json", "r") as f:
        data = json.load(f)
    return data

app = Starlite(
    route_handlers=[test_endpt],
    logging_config=LoggingConfig(),
    middleware=[LoggingMiddlewareConfig(response_log_fields=('status_code')).middleware]

)
if __name__ == '__main__':
    uvicorn.run("main:app", host="0.0.0.0", port=8001, reload=True)

Tracemalloc showing memory is being held on to (LoggingMiddleware enabled):

[ Top 10 ]
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/starlite/utils/serialization.py:161: size=1730 MiB, count=61, average=28.4 MiB
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py:208: size=64.0 KiB, count=1212, average=54 B
.pyenv/versions/3.11.0/lib/python3.11/json/decoder.py:353: size=61.8 KiB, count=675, average=94 B
.pyenv/versions/3.11.0/lib/python3.11/asyncio/locks.py:168: size=46.0 KiB, count=124, average=380 B
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/starlite/connection/request.py:89: size=45.8 KiB, count=61, average=768 B
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/starlite/utils/serialization.py:192: size=44.7 KiB, count=614, average=75 B
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/starlite/middleware/logging.py:160: size=43.5 KiB, count=366, average=122 B
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/h11/_util.py:92: size=37.4 KiB, count=612, average=63 B
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/starlite/middleware/logging.py:154: size=23.9 KiB, count=124, average=198 B
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py:244: size=13.9 KiB, count=122, average=117 B

Tracemalloc with LoggingMiddleware disabled:

[ Top 10 ]
/.pyenv/versions/3.11.0/lib/python3.11/json/decoder.py:353: size=19.7 KiB, count=316, average=64 B
/code/starlite-fiddle/main.py:10: size=3149 B, count=47, average=67 B
<frozen abc>:123: size=941 B, count=11, average=86 B
/.pyenv/versions/3.11.0/lib/python3.11/asyncio/base_events.py:1884: size=528 B, count=1, average=528 B
/.pyenv/versions/3.11.0/lib/python3.11/typing.py:844: size=470 B, count=4, average=118 B
/.pyenv/versions/3.11.0/lib/python3.11/email/utils.py:162: size=468 B, count=9, average=52 B
/.pyenv/versions/3.11.0/lib/python3.11/typing.py:690: size=432 B, count=6, average=72 B
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/anyio/to_thread.py:31: size=432 B, count=3, average=144 B
starlite-fiddle-LywfGICr-py3.11/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py:208: size=392 B, count=7, average=56 B
/.pyenv/versions/3.11.0/lib/python3.11/logging/__init__.py:1572: size=384 B, count=2, average=192 B

This issue causes logging to scale linearly with (uncompressed) requests. We caught this in production when memory spiked to 40GB after deploying LoggingMiddleware. Any help would be hugely appreciated. We've disabled the middleware for now.

Goldziher commented 1 year ago

ouch, yes this is pretty bad. Will make this a priority.

chris-telemetry commented 1 year ago

Investigation led me to this function in starlite.middleware.logging.py as the culprit:

        async def send_wrapper(message: "Message") -> None:
            if message["type"] == HTTP_RESPONSE_START:
                set_starlite_scope_state(scope, HTTP_RESPONSE_START, message)
            elif message["type"] == HTTP_RESPONSE_BODY:
                set_starlite_scope_state(scope, HTTP_RESPONSE_BODY, message)
                self.log_response(scope=scope)
            await send(message)

        return send_wrapper

Commenting out set_starlite_scope_state(scope, HTTP_RESPONSE_BODY, message) solves the issue.

I tried overwriting the state after the response is logged:

async def send_wrapper(message: "Message") -> None:
            if message["type"] == HTTP_RESPONSE_START:
                set_starlite_scope_state(scope, HTTP_RESPONSE_START, message)
            elif message["type"] == HTTP_RESPONSE_BODY:
                set_starlite_scope_state(scope, HTTP_RESPONSE_BODY, message)
                self.log_response(scope=scope)
                set_starlite_scope_state(scope, HTTP_RESPONSE_START, '')
                set_starlite_scope_state(scope, HTTP_RESPONSE_BODY, '')
            await send(message)

But that didn't help. I may not understand how state is controlled though.

chris-telemetry commented 1 year ago

ouch, yes this is pretty bad. Will make this a priority.

Thanks for the quick response. There's a thread in the discord where we discussed this. There may be some additional context if that helps.

Goldziher commented 1 year ago

so, the problem is that message is an ASGIMessage object, which is actually a dictionary - hence it keeps a reference in memory. The fact this reference is saved inside another dictionary- the "scope" dictionary attached to the request instance, keeps the request alive and makes it not garbage collectable. We have to switch to using weak refs in this case or save data that is garbage collectable.

provinzkraut commented 1 year ago

I think another solution would be to slightly restructure the middleware, so it doesn't need to store the message in the scope to begin with. If this can easily be done it would be preferable imo.

Goldziher commented 1 year ago

@chris-telemetry - can you test this branch. 1137-bug-loggingmiddleware-stores-all-responses-permanently-in-memory-memory-leak? I will add a PR for it.

chris-telemetry commented 1 year ago

I'll test. Thank you!

chris-telemetry commented 1 year ago

@Goldziher It looks good to me. Used my prior test case of 60 requests with ~20MB responses.

[ Top 10 ]
/Users/christophermoyer/opt/anaconda3/lib/python3.9/json/decoder.py:353: size=24.0 KiB, count=318, average=77 B
/Users/christophermoyer/opt/anaconda3/lib/python3.9/abc.py:123: size=5022 B, count=54, average=93 B
/Users/christophermoyer/opt/anaconda3/lib/python3.9/asyncio/events.py:80: size=4728 B, count=12, average=394 B
/Users/christophermoyer/code/starlite/starlite/utils/extractors.py:159: size=3800 B, count=9, average=422 B
/Users/christophermoyer/code/starlite/starlite/parsers.py:65: size=2079 B, count=21, average=99 B
/Users/christophermoyer/Library/Caches/pypoetry/virtualenvs/starlite-fioynTQM-py3.9/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:208: size=1866 B, count=21, average=89 B
/Users/christophermoyer/code/starlite/starlite/middleware/exceptions/middleware.py:47: size=1456 B, count=3, average=485 B
/Users/christophermoyer/code/starlite/starlite/connection/request.py:138: size=1336 B, count=3, average=445 B
/Users/christophermoyer/Library/Caches/pypoetry/virtualenvs/starlite-fioynTQM-py3.9/lib/python3.9/site-packages/h11/_connection.py:411: size=1312 B, count=3, average=437 B
/Users/christophermoyer/Library/Caches/pypoetry/virtualenvs/starlite-fioynTQM-py3.9/lib/python3.9/site-packages/h11/_connection.py:470: size=1288 B, count=3, average=429 B

Goldziher commented 1 year ago

@Goldziher It looks good to me. Used my prior test case of 60 requests with ~20MB responses.

[ Top 10 ]
/Users/christophermoyer/opt/anaconda3/lib/python3.9/json/decoder.py:353: size=24.0 KiB, count=318, average=77 B
/Users/christophermoyer/opt/anaconda3/lib/python3.9/abc.py:123: size=5022 B, count=54, average=93 B
/Users/christophermoyer/opt/anaconda3/lib/python3.9/asyncio/events.py:80: size=4728 B, count=12, average=394 B
/Users/christophermoyer/code/starlite/starlite/utils/extractors.py:159: size=3800 B, count=9, average=422 B
/Users/christophermoyer/code/starlite/starlite/parsers.py:65: size=2079 B, count=21, average=99 B
/Users/christophermoyer/Library/Caches/pypoetry/virtualenvs/starlite-fioynTQM-py3.9/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:208: size=1866 B, count=21, average=89 B
/Users/christophermoyer/code/starlite/starlite/middleware/exceptions/middleware.py:47: size=1456 B, count=3, average=485 B
/Users/christophermoyer/code/starlite/starlite/connection/request.py:138: size=1336 B, count=3, average=445 B
/Users/christophermoyer/Library/Caches/pypoetry/virtualenvs/starlite-fioynTQM-py3.9/lib/python3.9/site-packages/h11/_connection.py:411: size=1312 B, count=3, average=437 B
/Users/christophermoyer/Library/Caches/pypoetry/virtualenvs/starlite-fioynTQM-py3.9/lib/python3.9/site-packages/h11/_connection.py:470: size=1288 B, count=3, average=429 B

Great, we'll get a patch out asap

chris-telemetry commented 1 year ago

Incredible. I really appreciate it.

litestar-org / litestar

Bug: LoggingMiddleware stores all responses permanently in memory (memory leak) #1137