litestar-org / litestar

Production-ready, Light, Flexible and Extensible ASGI API framework | Effortlessly Build Performant APIs
https://litestar.dev/
MIT License
5.51k stars 376 forks source link

Question: problems with sqlalchemy repository update and lazy loading #1895

Closed patrickarmengol closed 1 year ago

patrickarmengol commented 1 year ago

I'm having trouble reading a field representing a many-to-many relationship after using the repo update method. This is what I found out so far.

My sqla repository has statement=select(Item).options(selectinload(Item.tags)) in order to include the tags in the response, which works great for the get and list methods.

When setting for the lazy arg for the tags relationship defined for the Item model, AKA configuring the loader strategy at mapping time :

sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Item at 0x7fb31aaea590> is not bound to a Session; lazy load operation of attribute 'tags' cannot proceed (Background on this error at: https://sqla
lche.me/e/20/bhk3)

I read through the above error help. I already have expire_on_commit=False and my sqla repository has statement=select(Item).options(selectinload(Item.tags)), but for some reason update doesn't care about the statement options even though it calls get internally, and, from what I can tell, instead uses the loader strategies configured at mapping time.

Here's some of my code:

item_tag = Table(
    "item_tag",
    UUIDBase.metadata,
    Column("item_id", ForeignKey("item.id"), primary_key=True),
    Column("tag_id", ForeignKey("tag.id"), primary_key=True),
)

class Item(UUIDBase):
    name: Mapped[str] = mapped_column(String(), unique=True)
    description: Mapped[str | None]
    tags: Mapped[list[Tag]] = relationship(
        secondary=item_tag,
        back_populates="items"
        lazy="noload"  # <-- here be problems
    )

class Tag(UUIDBase):
    name: Mapped[str] = mapped_column(String(), unique=True,
    items: Mapped[list[Item]] = relationship(
        secondary=item_tag,
        back_populates="tags",
        lazy="noload"
    )

@patch(
    path="/{item_id:uuid}",
)
async def update_item(
    item_repo: ItemRepository,
    tag_repo: TagRepository,
    data: ItemUpdate,
    item_id: UUID = Parameter(
        title="Item ID",
        description="The item to update",
    ),
) -> ItemDB:
    dd = data.dict(exclude_unset=True)
    dd.update({"id": item_id})
    if "tag_names" in dd:
        if len(dd["tag_names"]) != 0:
            dd["tags"] = await tag_repo.list(
                CollectionFilter("name", dd["tag_names"])
            )
        else:
            dd["tags"] = []
        del dd["tag_names"]
    obj = await item_repo.update(Item(**dd))
    await item_repo.session.commit()
    return parse_obj_as(ItemDB, obj)  # <-- here be errors

Hopefully that was clear and I provided enough context. I'm a bit lost on how to solve this and wondering if it may indicate an issue with the method's implementation itself.

Funding

Fund with Polar

cofin commented 1 year ago

Out of curiosity, does this issue persist if the refresh method on the update is not called?

@peterschutt, this error looks familiar to me, but I haven't seen it in a while. This may have something to do with why I removed the refresh.

patrickarmengol commented 1 year ago

Out of curiosity, does this issue persist if the refresh method on the update is not called?

@peterschutt, this error looks familiar to me, but I haven't seen it in a while. This may have something to do with why I removed the refresh.

I initially noticed this on v2.0.0beta2, that didn't have the refresh. I updated the litestar dep to point to main and the behavior persists.

I'll try to set up a test repo.

cofin commented 1 year ago

I've actually almost finished setting up the test case. I'll link the PR here in just a moment in case you want to test as well.

cofin commented 1 year ago

sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Item at 0x7fb31aaea590> is not bound to a Session; lazy load operation of attribute 'tags' cannot proceed (Background on this error at: https://sqla lche.me/e/20/bhk3)

This error is likely due to the call of expunge on the repo methods.

Re: the others, I'm wondering if there are additional arguments that should be added to the refresh call. I'll continue to investigate.

Also, I've got a test case setup that works on all of the backends, and it does produce the behavior you describe.

cofin commented 1 year ago

I've been able to narrow this down to the use of the refresh call.

The expunge and refresh repository calls can now be optionally skipped base on __init__ parameters. I ran through the tests with the various flags set and here's what I found:

With expunge=False and refresh=True, calls using a synchronous repository work, but asyncio fails due to the lack of lazy loading support.

With both set to False and the session set to expire_on_commit=False, I don't see any issues for sync or asyncio.

cofin commented 1 year ago

I'm proposing that we expose make the expunge, flush, and refresh options a bit more configurable.

In this case, you would continue to have expire_on_commit=False enabled, but you can also override the repository behavior an initialization or method-by-method. Your call might look something like this now:

To set for all calls to the repository:

item_repo = ItemRepository(auto_refresh=False, auto_expunge=False)
obj = await item_repo.update(Item(**dd))

And to override a single method call:

item_repo = ItemRepository()
obj = await item_repo.update(Item(**dd), auto_refresh=False, auto_expunge=False)
patrickarmengol commented 1 year ago

@cofin Hey massive thanks for your work here. I like the configurable kwargs pattern, though I suspect it would be a bit difficult to document or explain the cases where modifying it is necessary.

I got a chance to experiment with the changes. Somehow, I'm not seeing issues via the unit tests even for a special case in my fork of the litestar repo, but in practice I can't seem to avoid errors for any combination of kwargs when utilizing update in a route handler method. When trying to do a partial update where the update data doesn't include the relationship field (like just updating name, not tags), I get the following error when trying to lazy load the relationship field from the updated object:

sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)

Here's the affected code in my repo for reference: https://github.com/patrickarmengol/lsdemo/blob/master/src/lsdemo/routes.py#L71

Perhaps I'm missing something simple. My understanding of this bit of sqlalchemy is quite limited. Any help is appreciated.

cofin commented 1 year ago

@patrickarmengol, happy to help here.

Is this still an error for you?

patrickarmengol commented 1 year ago

@patrickarmengol, happy to help here.

Is this still an error for you?

Thanks, yes please, if you could. I added some installation instructions to the lsdemo repo if you want to see what I'm seeing. The terms I'm using will reference that project.

Here are the steps I'm taking to reproduce:

  1. create a tag
  2. create an item with that tag
  3. update only the description of the item

The problem occurs when trying to read the tags in the returned model. It's giving me a lazy-load error, even though I'm eagerly loading the tags in the statement for the repo (statement=select(Item).options(selectinload(Item.tags))) which is used in the get() call inside update(). Perhaps the object returned from the session.merge() loses that? I'm quite confused...

I've been trying to experiment with rewriting the update method for the repo. Is there a particular reason session.merge() is used over a sqlalchemy update() statement or even just iterating over the update data and calling setattr()?

EDIT: Just for clarification: