Open math2001 opened 5 years ago
Sequencer
was one of the very first pieces of async code I wrote, back before trio or nurseries existed, and has survived almost unchanged since then. So it's very possible that it could do with some refinement :-).
I like the idea of giving an explicit sequence of labels. However, I'd like it even better if they were descriptive labels, like strings!
s = Sequencer("task 1 start", "task 2 start", "task 1 exit", ...)
So I definitely wouldn't use : Number
as the type hint :-). Any interest in putting together a PR?
Re: how __call__
behaves when its wait is cancelled: my concern was that if one of the waits was cancelled (e.g., for example, due to a crash, or some kind of timeout), then all of the later waits would become deadlocked. If the wait was cancelled because the whole test was cancelled, then of course that's fine – all of the waits were cancelled. But I was worried about the case where someone (maybe accidentally) cancels just one of them. So I wrote the code to convert that into a RuntimeError
. But if we just raise RuntimeError
, then yeah that'll probably end up cancelling all the other waits... but then they'll all be converted into RuntimeError
too, so the user will get a big pile of RuntimeError
s obscuring whatever their real issue was.
I dunno, it might not be the best approach :-). Definitely feel free to open another issue to discuss more if you think we can do better. That's the background though.
Second question: isn't it bad to access an external state from a coroutine? call is called from multiple "parallel" coroutines, and it changes the state of a set? Or do we consider it safe because no one should be iterating on this set at all?
Of course you have to be careful when sharing state between tasks. But, the whole point of a synchronization primitive like this is to coordinate between tasks, so we're going to have to share some state :-). Fortunately, since Trio tasks are only rescheduled at checkpoints, and checkpoints only happen at await
, and set
operations don't involve await
... we know that all set
operations are atomic! It's true that if someone did a checkpoint while iterating across the set
:
for ... in self._claimed:
await ...
...then another task could invalidate their iterator by mutating the set
. But in the case of Sequencer
that can't happen, because we never iterate across the set
.
However, I'd like it even better if they were descriptive labels, like strings!
Hum... One the problem of this approach (letting the sequence ahead of time what to wait for) was that it added an extra cost when setting up the test: we have to specify sequence points twice, but that cost is compensated by the improvements in maintainability. But if we require strings, then the initial cost is going to much higher: we'll have to think of a name that expresses order (a number) and describes the sub-task in a succinct way, which I think is quite expensive for the value it brings to the readability of the test. But I definitely see now that it's more valuable that I originally thought.
Any interest in putting together a PR?
Yep, I'll definitely give this a go!
Thanks for the explanation :-) I think I've been a bit too pessimist with my code then... But hang on, I'm not sure. I'll think about it a bit more...
but then they'll all be converted into RuntimeError too, so the user will get a big pile of RuntimeErrors obscuring whatever their real issue was.
Yep, I got that with the tests, and it's a quite annoying... I'm not sure if there are that many cases where we actually need to explicitly cancel everything. I'll try to think about that as well :-)
I'm not sure how I should go about adding this into trio:
LabeledSequencer
for example)trio
and trio.testing
?)Sequencer
should know how to behave based on whether it's given labels when it's created.The third option is possible, but the hardest bit is going to be documenting it... In essence, the 2 sequencer work on different principles. Either you use it one way, or another, but you can't do a mix. So, we might as well do a different sequencer... So, from the user perspective, option 1.
But the problem with the first option is on the code's end: LabeledSequencer
is literally a copy paste of Sequencer
, with a few extra lines. That's duplicate logic, which is kind of bad.
So, here's what I propose: make an internal sequencer, that's undocumented, maybe _Seq
, that implements option 3. And then, we expose Sequencer
and LabelSequencer
that just use _Seq
's logic.
I feel like I might be over complicating quite a simple thing though 🤔
I feel like I might be over complicating quite a simple thing though 🤔
Yep, I was... LabeledSequencer
can just inherit from Sequencer
and simply convert labels to position by looking at the index in self._labels
.
Back to strings vs numbers for the labels, though. 😄
I adapted the example given in the documentation for Sequencer
to use LabeledSequencer
, and using strings, that's what we get:
import trio
import trio.testing
async def worker1(seq):
async with seq("worker 1 first"):
print(0)
async with seq("worker 1 second"):
print(4)
async def worker2(seq):
async with seq("worker 2 first"):
print(2)
async with seq("worker 2 second"):
print(5)
async def worker3(seq):
async with seq("worker 3 first"):
print(1)
async with seq("worker 3 second"):
print(3)
async def main():
seq = trio.testing.LabeledSequencer(
"worker 1 first",
"worker 3 first",
"worker 2 first",
"worker 3 second",
"worker 1 second",
"worker 2 second"
)
async with trio.open_nursery() as nursery:
nursery.start_soon(worker1, seq)
nursery.start_soon(worker2, seq)
nursery.start_soon(worker3, seq)
trio.run(main)
When I look at the workers, it's obvious that the first async with seq(...)
in worker1
is the worker 1's first sequence point, and same for all the other ones: the string message doesn't add any information. What I care about though is when it's going to run in the sequence, and if I want to know that, I have to find the initiating statement, see that worker 1 first
is going to run first (which is just a coincidence), scroll back up because I forgot what worker 1 first
was doing, and this for every sequence point.
OTOH, with numbers which indicate the order, we'd get something like this:
import trio
import trio.testing
async def worker1(seq):
async with seq(0):
print(0)
async with seq(40):
print(4)
async def worker2(seq):
async with seq(20):
print(2)
async with seq(50):
print(5)
async def worker3(seq):
async with seq(10):
print(1)
async with seq(30):
print(3)
async def main():
seq = trio.testing.LabeledSequencer(0, 10, 20, 30, 40, 50)
async with trio.open_nursery() as nursery:
nursery.start_soon(worker1, seq)
nursery.start_soon(worker2, seq)
nursery.start_soon(worker3, seq)
trio.run(main)
I find this much better because I can just look at the workers, and without even bothering finding the initiation (since I make the assumption that the labels are be sorted in ascending order --- again, this could (should) be enforced by trio), I can quickly understand the order in which things are going to run.
If we require string, what I would put in it (and what I'd assume most people will put in them) are just the position of the label ie. "first"
, "second"
, etc... Maybe with some extra information, but right now I can't think of anything that would make sense to be in the sequencer's labels. If some description is required, then a comment would probably be more appropriate.
What do you think?
@math2001 would you share your test's use case for Sequencer?
I haven't needed it so far. My theory is that in most cases tests be structured more clearly and explicitly without it.
This Sequencer is such a wonderful idea, but it's quite a painful process to change some code that already uses it: ie. when you want to add a sequence point N in the middle of an existing chain of event, you have to increase every position greater than N by one, which is quite annoying.
This could be avoided by letting the Sequencer know ahead of time which position it will have to wait for. For example:
The numbers become just labels, instead of holding an actual meaning to the sequencer. What matters is their position in the initiation. With this, if I want to add a sequence point between 0 and 10, I can just add an
async with seq(5)
somewhere, add 5 between 0 and 10 in the initiation, and it's done. Users could split even further down using floats. Again, the type could be anything (strings, whatever), it's just that numbers make the desired order more obvious, without constantly referring to the initiating statement.I've tried to do a basic implementation inspired from how the Sequencer currently works, without any safety net.
In use:
Anything I missed?
Nothing prevents the user from doing something like
Seq(10, 0, 20)
, which would make 10 run before 0, but that can be easily added as a requirement in__init__
by comparing the labels and it's sorted version.I've also got a few questions about the current implementation (not sure if this should be in the forum or in an other issue):
https://github.com/python-trio/trio/blob/4a725b67a79b9f35668bacac9c45867c9e27d885/trio/testing/_sequencer.py#L78-L79
Why? Doesn't this create a massive chain reaction of things going off all at once, which the user explicitly said didn't want to happen? The sequence point would eventually end up being cancelled on their own any way, with the error propagating up the nursery and cancelling every children. My best guess is that this is a safety to prevent a coroutine from blocking because the user caught a
trio.Cancelled
somewhere themselves.Second question: isn't it bad to access an external state from a coroutine?
__call__
is called from multiple "parallel" coroutines, and it changes the state of aset
? Or do we consider it safe because no one should be iterating on this set at all?