Bug: should raise config error for 'yield' dependencies with `cache=True`.

litestar-org / litestar

Production-ready, Light, Flexible and Extensible ASGI API framework | Effortlessly Build Performant APIs

https://litestar.dev/

MIT License

5.34k stars 361 forks source link

Bug: should raise config error for 'yield' dependencies with `cache=True`. #2771

Closed marcuslimdw closed 9 months ago

marcuslimdw commented 9 months ago

Description

When generators are used for dependency injection with use_cache=True, the generator itself, instead of the value it yields, is cached, which causes a StopIteration to be raised on all subsequent requests.

URL to code causing the issue

No response

MCVE

from litestar import Litestar
from litestar import get
from litestar.di import Provide

class Service:
    def do(self):
        pass

def provider():
    yield Service()

@get("/")
async def get(service: Service) -> None:
    service.do()

app = Litestar(
    debug=True,
    route_handlers=[get],
    dependencies={"service": Provide(provider, use_cache=True)},
)

Steps to reproduce

1. Run the MVCE.
2. Send a GET request to `/`. It should succeed with 200 OK.
3. Send another GET request to `/`. It should fail with 500 Internal Server Error.

Screenshots

"![SCREENSHOT_DESCRIPTION](SCREENSHOT_LINK.png)"

Logs

No response

Litestar Version

2.3.2

Platform

[ ] Linux
[X] Mac
[ ] Windows
[ ] Other (Please specify in the description above)

[!NOTE]
While we are open for sponsoring on GitHub Sponsors and OpenCollective, we also utilize Polar.sh to engage in pledge-based sponsorship.

Check out all issues funded or available for funding on our Polar.sh Litestar dashboard

If you would like to see an issue prioritized, make a pledge towards it!

We receive the pledge once the issue is completed & verified

This, along with engagement in the community, helps us know which features are a priority to our users.

peterschutt commented 9 months ago

What are you trying to do? The example you've given doesn't need to be a generator and generally the "yield" dependencies are used to easily define startup and teardown code, e.g.:


class Service:
    def do(self):
        pass

    def close(self):
        print("close called")

def provider():
    yield (service := Service())
    service.close()

Make these changes to the example and you'll see that teardown code is called after the request has been executed, and before any 2nd request is made.

marcuslimdw commented 9 months ago

I know about yield dependencies and cleanup - it's the same pattern as pytest (which I really like btw)

I feel like we might be miscommunicating here - I am aware that the cleanup code runs after the first request. That's not the problem; the problem is that if you hit the route again, you get a 500 because the cached dependency is not the value yielded by the dependency, but the generator producing that value. I know that yield is not technically necessary here, but the MVCE is intended to illustrate the caching problem, not that there is actually something to clean up.

To put it another way, imagine what happens when we change yield service to return service (which means we omit cleanup). When we do that, second and subsequent calls do not fail.

That seems, to me, to be clearly a bug caused by caching the wrong thing - when it's a normal dependency, it's the value, but when it's a yield dependency, it's the generator. From a UX perspective, shouldn't return and yield dependencies work the exact same way, except that yield dependencies allow cleanup? And that's exactly how it works when use_cache=False, but when use_cache=True, things break.

(as an aside, I actually created this bug because I encountered a deeper problem with yield dependencies + caching, but I think it can wait for another time)

peterschutt commented 9 months ago

My point is that trying to use something after its cleanup code has executed does't seem like a great idea.

I agree that caching the generator is not desired, but I'd like to know why we shouldn't just raise a configuration error on startup when we detect this.

marcuslimdw commented 9 months ago

That's the original point I was wondering about - is the current behaviour correct?

What I really want to model are long-lived dependencies e.g. a connection pool that I create at startup. Preferably, I'd be able to initialise such a dependency lazily, cache it for the lifetime of my application, and have it cleaned up at shutdown. Currently, it seems that use_cache does not fit this usecase.

I know that one way to achieve this is a on_startup hook that sets attributes on the app State object, which I would subclass to add my custom typed attributes, and then dependencies could themselves depend on it, so my questions really are:

Is on_startup + app State the one blessed way to initialise and provide long-lived dependencies? (If so, then this should probably be a configuration error on startup)
If not, should how yield dependencies and use_cache interact be changed to cache the returned value and run cleanup only at shutdown time?

peterschutt commented 9 months ago

OK, thank you for explaining.

I would say that lifespan context managers would be the preferred approach to this now days.
I think this would be a difficult change and we'd only try to make it if there was a problem that couldn't be solved in other ways.

So, I think that checking for the case of 'yield' dependency with cache=True to raise a config error is the way to go. Thanks for raising this.

What I really want to model are long-lived dependencies e.g. a connection pool that I create at startup. Preferably, I'd be able to initialise such a dependency lazily, cache it for the lifetime of my application, and have it cleaned up at shutdown. Currently, it seems that use_cache does not fit this usecase.

This is an example from advanced-alchemy:

    @asynccontextmanager
    async def lifespan(
        self,
        app: Litestar,
    ) -> AsyncGenerator[None, None]:
        deps = self.create_app_state_items()
        app.state.update(deps)
        try:
            if self.create_all:
                await self.create_all_metadata(app)
            yield
        finally:
            await cast("AsyncEngine", deps[self.engine_dependency_key]).dispose()

It creates and adds sqlalchemy engine and session factory instances to the app state. Having those things instantiate lazily is unusual, but you could do something with a closure:

@asynccontextmanager
async def lifespan(
    self,
    app: Litestar,
) -> AsyncGenerator[None, None]:
    pool_instance: Pool | None = None

    def get_pool() -> Pool:
        if pool_instance is None:
            pool_instance = Pool()
        return pool_instance

    app.state.update(get_pool=get_pool)
    try:
        yield
    finally:
        if pool_instance is not None:
            pool_instance.clean_up()

def provide_pool(state: State) -> Pool:
    return state["get_pool"]()

app = Litestar(..., lifespan=[lifespan], dependencies={"pool": provide_pool})

marcuslimdw commented 9 months ago

oh cool I didn't notice that - works for me! should I close this issue then and create one for the config change?

peterschutt commented 9 months ago

oh cool I didn't notice that - works for me! should I close this issue then and create one for the config change?

I think it is fine to track that with this issue - I'll update the title.

provinzkraut commented 9 months ago

I'd like to know why we shouldn't just raise a configuration error on startup when we detect this.

I was going to suggest the same thing. Caching a dependency with cleanup doesn’t make a lot of sense conceptually, and is most likely an antipattern, so we should just not allow this at all.

marcuslimdw commented 9 months ago

I'm not sure if it's just my own mental model, but what I thought would happen with cached dependencies was that the dependency itself would be cached and cleanup would run automagically on application shutdown, rather than after the first request (so caching behaviour would be different between return and yield dependencies).

In other words, I was using cached yield dependencies where I should have used lifespan. If other people have this misconception too, it might be worth it to make a change to the docs.