Closed ea7c4c07-553b-4446-b959-72669dcdd94f closed 9 years ago
I have a pattern where I read from a queue with a timeout, generally like this:
while True:
reader = asyncio.async(wait_for(queue.get(), 0.1))
try:
item = (yield from reader)
except asyncio.TimeoutError:
reader.cancel()
continue
This is to have a loop where we try to get items from the queue with a timeout. Prior to Python 3.5, wait_for() doesn't automatically cancel the reading task, so I have to do it explicitly above.
The code has a strange race condition where, if a taks calls put_nowait on a reader task that is just about to be cancelled due to timeout, then the item that wast put onto the queue gets lost.
In the tests framework, the minimal test case I can come up with is mainly this code:
q = asyncio.Queue(loop=loop)
reader = loop.create_task(q.get())
loop.run_until_complete(asyncio.sleep(0.01, loop=loop))
q.put_nowait(1)
q.put_nowait(2)
reader.cancel()
When the reader gets cancelled, the item 1
is lost into the ether, and when you create another reader for the same queue, it only gets the second item 2
. I would expect that, if a reader task is cancelled, the item at the head of the queue doesn't get lost. Either the reader succeeds and the item is removed, or it doesn't succeed and the item is left alone.
I attach a patch to the tulip tests that reproduces the problem in both Python 3.4.2 and tulip hg.
Looks like a valid bug report, I like the test you provided, and the fix seems on the right track. Comments on the fix:
I'd really like to see a rietveld diff for both patches.
Are there other places where a cancellation can have a similar effect? Maybe the same logic in put()?
The fix uses waiter.result() but if waiter itself is cancelled this will raise a CancelledError. This is probably fine because in that case no value has been put into it. But now this got me thinking -- how can the yield-from in get() receive a CancelledError without the waiter being cancelled? IOW how does the fix actually work??? (Maybe because the generator has already returned when the Task wrapping it is cancelled?)
I created a codereview issue: https://codereview.appspot.com/222930043
- Are there other places where a cancellation can have a similar effect? Maybe the same logic in put()?
Hm.. I didn't look, but yes, it does look like it might be affected by the same issue. I'll try to create a test for that to confirm.
how can the yield-from in get() receive a CancelledError without the waiter being cancelled?
Well, it definitely gets a CancelledError in the return (yield from waiter). I added extensive logging and could confirm this is the case.
What happens is that
task
is running the coroutine q.get)yield from waiter
only because this is the place where the Queue.get task was suspended, but waiter itself is not cancelled.Make sense. I'll be waiting for your updated patch. Thanks for both the bug report and the fix!
So I uploaded a new patch version fixing a similar problem in put().
I'm sorry, I don't have time to review this (and it's subtle enough that I don't want to approve it without understanding).
Maybe Victor understands?
queue_bug.py: script to reproduce the bug.
I confirm that Queue.get() sometimes looses items when it is cancelled. The waiter contains the result, but the waiter is lost when get() is cancelled.
Queue.get() waiter got a result, but Queue.get() wakeup is only *scheduled* yet. It may even be executed in the same iteration, or maybe in the next loop iteration.
I didn't review patches yet.
Don't know if it helps, but I made a github pull request for this:
New changeset 7aa2d3e1c885 by Yury Selivanov in branch '3.4': Issue bpo-23812: Fix asyncio.Queue.get() to avoid loosing items on cancellation. https://hg.python.org/cpython/rev/7aa2d3e1c885
New changeset d5644d7e222d by Yury Selivanov in branch '3.5': Issue bpo-23812: Fix asyncio.Queue.get() to avoid loosing items on cancellation. https://hg.python.org/cpython/rev/d5644d7e222d
New changeset 8f581da70ccd by Yury Selivanov in branch 'default': Issue bpo-23812: Fix asyncio.Queue.get() to avoid loosing items on cancellation. https://hg.python.org/cpython/rev/8f581da70ccd
The fix is committed. Closing the issue. Thanks a lot, Gustavo!
New changeset 024d4f4011c9 by Yury Selivanov in branch '3.4': Issue bpo-23812: Fix getter-cancellation with many pending getters code path https://hg.python.org/cpython/rev/024d4f4011c9
New changeset 2752fe734bfb by Yury Selivanov in branch '3.5': Merge 3.4 (issue bpo-23812) https://hg.python.org/cpython/rev/2752fe734bfb
New changeset e54c684e6788 by Yury Selivanov in branch 'default': Merge 3.5 (issue bpo-23812) https://hg.python.org/cpython/rev/e54c684e6788
Guido, Victor,
I've just pushed a commit to fix a misspelled method call in queues.py (related to the previous commit in this issue).
Along with fixing the bug and writing a unittest for it, I discovered an issue with the current queues design.
Here's an outline of the new unittest:
we have a queue with three queue.get() coroutines waiting for its items: g1
, g2
, g3
we do push_nowait(i1)
and push_nowait(i2)
; we cancel g1
the loop gets the chance to run
g1
detects that it was cancelled and pushes the item i1
in front of the queue (otherwise we just lose it).
Now, the problem is that g2
has already received i2
(by push_nowait
); and i1
will end up consumed by g3
!
This all means that with getter coroutines being cancelled, it's possible that getters might receive queue items out of order.
So the question is: is this a matter of documenting this behaviour, or we should consider some other design for queues implementation?
Honestly, I've lost track of the queue design. Maybe the push-back on cancellation is just wrong? After all, if a coroutine has received an item, it's out of the queue, even if it gets cancelled before it can do anything with the item.
On Thu, Aug 6, 2015 at 8:19 PM, Yury Selivanov \report@bugs.python.org\ wrote:
Yury Selivanov added the comment:
Guido, Victor,
I've just pushed a commit to fix a misspelled method call in queues.py (related to the previous commit in this issue).
Along with fixing the bug and writing a unittest for it, I discovered an issue with the current queues design.
Here's an outline of the new unittest:
we have a queue with three queue.get() coroutines waiting for its items:
g1
,g2
,g3
we do
push_nowait(i1)
andpush_nowait(i2)
; we cancelg1
the loop gets the chance to run
g1
detects that it was cancelled and pushes the itemi1
in front of the queue (otherwise we just lose it).Now, the problem is that
g2
has already receivedi2
(bypush_nowait
); andi1
will end up consumed byg3
!This all means that with getter coroutines being cancelled, it's possible that getters might receive queue items out of order.
So the question is: is this a matter of documenting this behaviour, or we should consider some other design for queues implementation?
----------
Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue23812\
Honestly, I've lost track of the queue design. Maybe the push-back on cancellation is just wrong? After all, if a coroutine has received an item, it's out of the queue, even if it gets cancelled before it can do anything with the item.
Honestly, I've lost track of the queue design. Maybe the push-back on cancellation is just wrong? After all, if a coroutine has received an item, it's out of the queue, even if it gets cancelled before it can do anything with the item.
I think the push-back is a right thing. At least, I wouldn't expect wait_for(queue.get(), 0.1)
code to lose queue items sometimes in big codebases.
I don't think the order for multiple concurrent getters matters that much. With analogy with the threading case, if multiple threads are blocked get()ing an item from the same queue, I would not presume to expect anything about the ordering which thread receives which item.
That being said, I'm sorry patch caused this problem, but losing items is an infinitely worse problem.
That being said, I think the problem stems from the way queue get is designed. Currently it is something like:
A better design is to make it so the future that get() is waiting for doesn't actually receive the item, it is only used to "wake up" the get() coroutine. I would be something like:
Clearly, in this design, handling cancellation in get is much simpler: you don't need to do anything, just let it cancel, no cleanup action needed.
A better design is to make it so the future that get() is waiting for doesn't actually receive the item, it is only used to "wake up" the get() coroutine. I would be something like:
- get(): in case the queue is empty, create a Future, add it to "_getters", then begin waiting for this future;
- When an item is being put(): 2.1. add the item to the internal queue 2.2. take the first getter and set its result to None
- the get() coroutine resumes, and then takes an item from the internal queue, returning it.
Clearly, in this design, handling cancellation in get is much simpler: you don't need to do anything, just let it cancel, no cleanup action needed.
I like this a lot. Do you want to draft a patch? :)
Sure, just give me a couple of days.
So this looks like it will miss 3.5.0rc1. How confident are we that the new patch won't introduce new bugs? This late in the release process that would be awkward.
On Fri, Aug 7, 2015 at 12:47 AM, Gustavo J. A. M. Carneiro \< report@bugs.python.org> wrote:
Gustavo J. A. M. Carneiro added the comment:
Sure, just give me a couple of days.
----------
Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue23812\
I was wrong, there still needs to be some cleanup in cancellation, even with the new approach. But it does solve the out-of-order problem.
I don't know if it should be applied to rc1. I wish I had more time to test. Up to you guys.
Guido, I agree, let's not push the updated implementation in 3.5.0.
Gustavo, could you please generate the patch with "hg diff", so that code review here works? And I think we need a new issue to track the new patch.
I am not using hg anymore, since asyncio migrated to git.
Here's a github PR, does that help? https://github.com/python/asyncio/pull/260
Yep, GH works. Thanks!
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['deferred-blocker', 'type-bug', 'expert-asyncio']
title = 'asyncio.Queue.put_nowait(), followed get() task cancellation leads to item being lost'
updated_at =
user = 'https://bugs.python.org/gustavo'
```
bugs.python.org fields:
```python
activity =
actor = 'yselivanov'
assignee = 'none'
closed = True
closed_date =
closer = 'yselivanov'
components = ['asyncio']
creation =
creator = 'gustavo'
dependencies = []
files = ['38740', '38741', '38820', '40142']
hgrepos = []
issue_num = 23812
keywords = ['patch']
message_count = 24.0
messages = ['239611', '239624', '239625', '239627', '239628', '239679', '239738', '240006', '246762', '248055', '248056', '248137', '248138', '248144', '248145', '248146', '248148', '248150', '248157', '248181', '248187', '248203', '248206', '248207']
nosy_count = 5.0
nosy_names = ['gvanrossum', 'gustavo', 'vstinner', 'python-dev', 'yselivanov']
pr_nums = []
priority = 'deferred blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23812'
versions = ['Python 3.4', 'Python 3.5']
```