python-gino / gino

GINO Is Not ORM - a Python asyncio ORM on SQLAlchemy core.
https://python-gino.org/
Other
2.67k stars 150 forks source link

Fix issue with concurrent contextual stack modification #747

Closed uriyyo closed 3 years ago

uriyyo commented 3 years ago

We are using gino in production, and we had a lot of unexpected exceptions from asynpg with a message:

asyncpg.exceptions._base.InterfaceError: cannot perform operation: another operation is in progress

The problem was, that these exceptions were unpredictable and we had no idea what we a doing wrong.

Basically, we a using asyncio based web framework and we had a lot of code patterns like this:

db = Gino()

def worker(entity):
     async with db.acquire(reuse=False):
          ... # do some database queries based on entity

def main():
     async with db.acquire(reuse=False):
          entities = await Entity.query.gino.all()
          tasks = [create_task(worker(entity)) for entity in entities)]
          # do some useful thing concurrently to started tasks
          result = await Entity.query.where(...).gino.all()  # make another query to db
          await gather(*tasks)

And sometimes we had an exception at this place:

result = await Entity.query.where(...).gino.all()  # make another query to db

This exception cancels one of the running worker tasks.

Today after hours of debugging I have found the root cause of this issue🥳

The problem was with the concurrent _ContextualStack._ctx value modification. When we are creating a new asyncio task we copy current contextvars to the created task. In a case when there are reusable connections it will copy deque of those connections to the context of a child task. In a case when we acquire a new DB connection in a child task it will modify the deque of reusable connection which is a completely same deque that are using at main task.

I hope you understood the issue)

The solution that I used is to use an immutable single linked list as a contextual stack, in such case concurrent tasks won't mutate the stack of each other.

@fantix It will be great for me and my team if you can review this PR and release new version of a gino which will include this fix, so we can remove the workaround from our code that we currently have.

fantix commented 3 years ago

Thanks for the very detailed explanation and the PR, and sorry for the trouble! It does make sense to me - I'd like to further check the code, at the same time would you mind fixing the syntax for Python 3.5 pls?

uriyyo commented 3 years ago

I have fixed python 3.5 syntax error. Sorry about that, I forget that gino supports version 3.5)

uriyyo commented 3 years ago

Hi @fantix, Do you have time to look at this PR?)

suleimanmahmoud commented 3 years ago

Great work @uriyyo

I was having the same issue over similar code steps and your PR fixed it.

fantix commented 3 years ago

Oh sorry for the late reply - I'll take a deeper look tonight.

uriyyo commented 3 years ago

Hi @fantix

Any updates? Had you a chance to take a look at this PR?

wwwjfy commented 3 years ago

He's probably been busy.

While I do understand the issue, I ponder if there is an easier way to fix it, as linked list seems a bit overkill in this case. I'll take some time to experiment with a non-sleepy mind tomorrow.

fantix commented 3 years ago

Oh thank you Tony! Yeah I'm sorry that I've been on a move and trying to put up a new uvloop release.

wwwjfy commented 3 years ago

While this PR can fix the case in the test, the stack is still shared with tasks created by current coroutine. This could cause the same problem, for example, when a connection is created and then reused in the new coroutines.

Ideally, in my mind, a new coroutine can automatically reset contextvars to empty. But it's not how contextvars works. Instead the new coroutine inherits the current one.

The solution I can think of now is to create task with _ctx reset to None before the actual execution. Gino can probably expose a method to do that. Any thought?

(Another option is to use loop.call_soon(task(), context=contextvars.Context()) to run it with an empty Context, but that would lose other contextvars too.)

uriyyo commented 3 years ago

Hi @wwwjfy, @fantix

I have reimplemented this PR. I agree that the previous solution with a linked-list can be overkill for this issue.

With the new approach contextual stack is bound to task where it was created, in a case when ctx is using in another task then ctx will be rewritten.

While this PR can fix the case in the test, the stack is still shared with tasks created by current coroutine. This could cause the same problem, for example, when a connection is created and then reused in the new coroutines.

This issue also resolved in this PR.

What is your opinion regarding the new implementation?

wwwjfy commented 3 years ago

Looks good. One concern is if we add support to trio later, we need to apply similar strategy there.

BTW, can you rebase on latest master, which fixed CI?

uriyyo commented 3 years ago

Hi Tony, I have rebased on master. Thanks for feedback regarding implementation🙂

wwwjfy commented 3 years ago

Thanks for the fix!