python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.06k stars 330 forks source link

Add timeout support to the test harness #168

Open njsmith opened 7 years ago

njsmith commented 7 years ago

It looks like twisted uses a 2 minute timeout by default, and the API for overriding that is to set a .timeout attribute on the test function.

I guess for #27 purposes we could stick with the 2 minute default ("it's traditional"), and then provide the usual pytest rigamorale for making the default overrideable (conf file, pytest.mark)

njsmith commented 7 years ago

(And of course it should be an argument to @trio_test, for those using it manually)

njsmith commented 7 years ago

Oh, duh, there's a huge issue with the naive implementation though: if someone is using a non-default clock inside their test, then the test timeout should still use the regular clock.

There are two approaches that jump to mind, and I'm not sure which would be better.

What twisted does, and I think @glyph would advocate, is to support multiple clocks in general. So like every timeout has a clock associated with it, and the clock passed to run becomes the "default clock", rather than the only clock. This is... somewhat complicated. At the API level it's not too bad, it would just mean attaching a clock to each cancel scope. (Unless we want to allow the same cancel scope to simultaneously have multiple deadlines on multiple different clocks, but that seems like it might be overkill.) And then propagate this through to the various APIs that use deadlines, like sleep_until. Internally, we would need to start keeping a separate timeout queue for each clock, and then when sleeping we'd have to loop through all the timeout queues and ask them each for how long we can sleep for.

The main argument for this is that there genuinely are multiple distinct clocks – like time.monotonic time and calendar time are simply different, incommensurable scales, and it makes sense to want to be able to say "this certificate is expiring at midnight on January 30, so I want to sleep until midnight on January 29 before renewing it". (This is complicated because of things like NTP and suspends – time.monotonic stops ticking while the computer is asleep.) Another use case @glyph mentioned last week is if you want to – for some reason – make part of your code "run slower". I'm not sure how much importance (if any) to assign this use case, but ok sure whatever. HOWEVER, on further thought I think this isn't as related to the test timeout problem as it might at first seem? Multiple timescale support is interesting conceptually, but needs an extra level of indirection compared to "simple" multiple-clock support: we might want to say "sleep until the calendar time is ...", but then for tests we'll want to be able to switch which clock is used to measure calendar time, and for the "make this code run slower" use case you'd want to be pulling the "default clock" out of task-local storage or similar. And crucially, for the test timeout use case, you really don't want to let that clock be overrideable or anything.

This seems like a lot of API surface area and tricky design decision to drag in just for having test timeouts. And the calendar time idea is cute, but extremely tricky to implement and AFAIK literally nobody else actually does implement this, so it's probably not our highest priority given that we don't yet support like, basic TCP server helpers or subprocesses.

The other approach is to provide a hard-coded thing just for the test case, like an extra argument to run that gets handled directly inside the run loop. This wouldn't be too hard to do (the low-level timeout calculation is already a bit tricky, and this would be simpler to implement than wait_all_tasks_blocked). And in the future if we ever add multiple clock support, it could become a convenience alias for wrapping the main function in a cancel scope with the given timeout.

...This also makes me realize that there's some question about what semantics we want, as well. What twisted uses is a simple global timeout for the whole test, after which I assume it just stops executing, abandoning any callback chains mid-flight. For many purposes an idle-time timeout makes more sense ("if all tasks have been asleep for X seconds, cancel the test") - in particular, one can set a pretty aggressive timeout for this so that frozen tests fail fast, but big complicated tests that just do a lot of stuff won't trigger it. However, we probably also want a single cumulative timeout ("if the test started X seconds ago, cancel it"), to catch cases like the ssl test that's mentioned in bpo-30437, where a misbehaving test was constantly sending data in an infinite loop, so it never went idle but would have responded to an absolute timeout.

Having both is also nice because if they give different errors, then it gives you an immediate clue whether your bug is a deadlock or a livelock.

Another advantage of implementing this directly in the run loop is that it can in principle abandon the run like (I assume) twisted does, breaking our normal rule that task stacks always get properly unwound. I don't feel like this is super valuable in practice though; sure, there will be the occasional case where a test is receiving but ignoring Cancelled exceptions, but this is very rare. (I've seen trio's tests hang many many times, but I haven't yet had a case that didn't respond to control-C. At least now that the control-C code is debugged ☺.) And abandoning the run without doing cleanup only helps if a test is ignoring Cancelled but is executing checkpoints (it doesn't help if it's just doing a while True: pass), plus it can do things like leave threads running in the background. So really the advantages of abandonment over proper cancellation seem minimal.

So some conclusions I think I've argued myself into:

And I'm still uncertain about whether it ends up being simpler to hard-code something for tests or to add basic support for multiple clocks (possibly hidden for now behind some private API) (though we try hard to avoid internal-only APIs...). Probably I should try implementing it and that will give me a good clue :-)

glyph commented 7 years ago

On May 26, 2017, at 12:08 AM, Nathaniel J. Smith notifications@github.com wrote:

AFAIK literally nobody else actually does implement this

I can't find the documentation for the Cocoa way to do this offhand, but timerfd_create on linux implements the multiple-calendars thing: http://man7.org/linux/man-pages/man2/timerfd_create.2.html http://man7.org/linux/man-pages/man2/timerfd_create.2.html.

Calendar time is implemented via CLOCK_REALTIME & CLOCK_REALTIME_ALARM as I understand it.

-g

njsmith commented 7 years ago

For the record, on MacOS it looks like you can do this by requesting the HOST_NOTIFY_CALENDAR_CHANGE Mach notification, and then adding the relevant Mach port to your kqueue (EVFILT_MACHPORT). Or using GCD's dispatch_after which accepts wall-clock dispatch times. Some more details.

And on Windows there's SetWaitableTimer. I think you use it by setting up a dedicated thread that just sits in an alertable state and occasionally wakes up to notify the main thread and reset the timer.

An interesting point about these OS features that actually do the full alerting (as opposed to merely giving us the tools to track the monotonic/wall-clock offset ourselves) is that they don't fit our current clock abstraction, which assumes that the job of a clock is to convert abstract deadlines into monotonic sleep timeouts. If we can assume that all clocks represent time using floats then it's almost enough – if we replaced the current "hey tell me how many ms until this deadline: ..." method with a "hey the next deadline is ..., and btw can you give me an upper bound on how long I should sleep" method, then it could work. (That method would adjust the timer, then return inf, and arrange that the run loop wakes up at the given time, at which point its normal check-for-expired-timers logic would kick in and expire any necessary timers.)

njsmith commented 7 years ago

Of course another feature of these fancy systems (and actually the not-so-fancy ones too) is that they don't actually need to be integrated deep into trio's event loop at all; you could have an external library that uses trio's public APIs to provide current_wall_clock_time and sleep_until_wall_clock_time functions and that would be fine, I think. It'd be a little bit fiddly internally to avoid allocating a new kernel object per call to sleep_until_wall_clock_time but I don't think it would be a big deal.

What's special about test harness timeouts is that they want to use the same clock conceptually as the rest of trio, but they want to avoid being mocked. (Don't we all.) I think the different timescales thing is actually orthogonal. So #173 is for timescales, and this bug can go back to being about test timeouts.

I just realized there's a very important reason why we don't want test timeouts to be implemented in the obvious way using a with fail_after(120): ... wrapped around the whole test, even if we didn't support pluggable clocks: trio.current_deadline() would expose this to the code under test, and changing the test timeout should not have observable effects on the code under test! And in fact, current_deadline is probably also a show-stopper for supporting multiple monotonic clocks in different parts of the same run, because then a single context could end up with deadlines on its cancel stack that are on different clocks, and there's no way for current_deadline() to compare them to find the minimum. Fortunately I didn't really want to support this anyway, so it's nice to find a solid excuse for my bias.

njsmith commented 6 years ago

Interesting subtlety, I learned from the pytest-timeout source: if you're debugging a test, then you want to disable the timeout.

njsmith commented 6 years ago

They also have some clever code (in timeout_timer) to disable pytest's normal stdout/stderr capturing after a timeout, because if they end up killing the process then they want to dump output before that. I guess we wouldn't need to care about this...