Use CPU time for --warn-long and a low default limit for non-long tests

orlitzky commented 2 years ago

As discussed in #32973, the default value of --warn-long is roughly a minute, independent of the --long flag, and is only enabled when timing information from previous test runs is available. Our developers' guide suggests that even a "long time" test should complete in under five seconds:

https://doc.sagemath.org/html/en/developer/doctesting.html#optional-arguments

This ticket addresses two problems:

The timing information is often unavailable. We should still warn about long tests in that case.
The default of 60 CPU-equivalent seconds is much too high when --long is not in use.

Both of these issues stem from using the "wall time" to measure how long a test has run. Wall time is inherently unstable; if you're watching a movie, the tests will take longer. On a slower computer, the tests will take longer. Et cetera. The use of timing information from previous runs and the obscenely high --warn-long limit partially address that.

So in targeting our two issues, we first aim to switch the timing reports and --warn-long flag to use CPU time instead of wall time. CPU time is more reliable, since it only reports time actually spent working on the problem. (It's still sensitive to things like physical CPU time and CFLAGS; those will be addressed in #33022). With CPU time being used, the timings will be much more consistent, allowing us to lower the default value of --warn-long to 10 CPU-seconds when --long was not given.

The main difficulty is in accounting for CPU time spent in pexpect subprocesses. We don't wait() for those and they may not terminate, so the OS's usual methods of retrieving child process statistics don't work for them. At present we have a PID-based solution that works quickly on Linux/BSD, but something similar for macOS would be nice.

CC: @vbraun @tornaria

Component: doctest framework

Work Issues: cputime by PID on macOS

Author: Michael Orlitzky

Branch/Commit: u/mjo/ticket/32981 @ d7d2cff

Reviewer: https://github.com/sagemath/sage/actions/runs/1992568379

Issue created by migration from https://trac.sagemath.org/ticket/32981

orlitzky commented 2 years ago

comment:1

Volker, any idea why we used wall time for the limit rather than CPU time? It's making it a bit awkward to pick a multiplier when no timing information is available (which is usually the case for me).

orlitzky commented 2 years ago

Branch: u/mjo/ticket/32981

orlitzky commented 2 years ago

Commit: 6c68187

orlitzky commented 2 years ago

Author: Michael Orlitzky

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from 6c68187 to 2959d09

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

2959d09 Trac #32981: decrease --warn-long limit for non-long tests.

orlitzky commented 2 years ago

comment:3

This is a bit too lenient IMO... allowing 15s for a non-long test when no timing information is available. Still, it's an improvement.

orlitzky commented 2 years ago

comment:4

I'm just going to put in the effort to switch this to CPU time, which makes a lot more sense for a time measurement that needs to be consistent across machines.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from 2959d09 to 66c4cc8

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`0de97ac`	`Trac #32981: use CPU time for --warn-long.`
`4b9b07b`	`Trac #32981: decrease --warn-long limit for non-long tests.`
`66c4cc8`	`Trac #32981: use a faster file to test the doctest framework.`

orlitzky commented 2 years ago

Description changed:

--- 
+++ 
@@ -1,6 +1,11 @@
-As discussed in #32973, the default value of `--warn-long` is roughly a minute, independent of the `--long` flag. Our developers' guide suggests that even a "long time" test should complete in under five seconds,
+As discussed in #32973, the default value of `--warn-long` is roughly a minute, independent of the `--long` flag, and is only enabled when timing information from previous test runs is available. Our developers' guide suggests that even a "long time" test should complete in under five seconds:

 https://doc.sagemath.org/html/en/developer/doctesting.html#optional-arguments

-so I think it's fair to warn about non-long-time tests that take longer than (say) ten seconds.
+This ticket addresses two problems:

+1. The timing information is often unavailable. We should still warn about long tests in that case.
+2. The default of 60 wall-time seconds is much to high when `--long` is not in use.
+
+Towards those ends, the first thing we do is switch the `--warn-long` flag to accept CPU time (as opposed to wall time) as an argument. This makes it independent of the CPU, and allows us to eschew the timing information heuristic. It should also make the warnings much more consistent. Afterwards, we set the default value of `--warn-long` to 10 CPU-seconds when `--long` was not given.
+

orlitzky commented 2 years ago

New commits:

`0de97ac`	`Trac #32981: use CPU time for --warn-long.`
`4b9b07b`	`Trac #32981: decrease --warn-long limit for non-long tests.`
`66c4cc8`	`Trac #32981: use a faster file to test the doctest framework.`

orlitzky commented 2 years ago

Description changed:

--- 
+++ 
@@ -5,7 +5,7 @@
 This ticket addresses two problems:

 1. The timing information is often unavailable. We should still warn about long tests in that case.
-2. The default of 60 wall-time seconds is much to high when `--long` is not in use.
+2. The default of 60 CPU-equivalent seconds is much to high when `--long` is not in use.

 Towards those ends, the first thing we do is switch the `--warn-long` flag to accept CPU time (as opposed to wall time) as an argument. This makes it independent of the CPU, and allows us to eschew the timing information heuristic. It should also make the warnings much more consistent. Afterwards, we set the default value of `--warn-long` to 10 CPU-seconds when `--long` was not given.

mwageringel commented 2 years ago

comment:9

One reason for using wall time instead of CPU time might be that it works better for computations that use external processes like Pexpect:

sage: R = PolynomialRing(GF(101), 'x', 8)
....: J = sage.rings.ideal.Cyclic(R)
....: %time gb = J.groebner_basis(algorithm='singular')
CPU times: user 917 ms, sys: 52.6 ms, total: 969 ms
Wall time: 33.2 s

jhpalmieri commented 2 years ago

comment:10

Maybe measure both and pay attention to wall time only if it is much longer than CPU time. I don't know if "much longer" should be a raw time (5 seconds longer) or a multiple (10 times as long). We want informative messages, so this doesn't have to be flawless.

orlitzky commented 2 years ago

comment:11

Well.. that's embarrassing. I'll have to think about it.

orlitzky commented 2 years ago

comment:12

Oh, we just have to pass subprocesses=True inside the doctest timer. I guess nobody noticed until now because the cputime wasn't reported anyway. I'll update the branch once I'm sure it works correctly.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from 66c4cc8 to d869a09

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`a5a65c0`	`Trac #32981: use CPU time for --warn-long.`
`e45130b`	`Trac #32981: decrease --warn-long limit for non-long tests.`
`d204ab4`	`Trac #32981: use a faster file to test the doctest framework.`
`9707639`	`Trac #32981: collect cputime for subprocesses while doctesting.`
`5c29f59`	`Trac #32981: report doctest success in CPU time.`
`3764197`	`Trac #32981: account for subprocesses in sage0 interface's cputime().`
`d869a09`	`Trac #32981: don't pass "None" to cputime() in sage0 interface.`

orlitzky commented 2 years ago

comment:14

That fixed the immediate problem, but caused several more. Doctesting the doctest framework using the sage pexpect interface to itself is pretty fragile, who would've thought...

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from d869a09 to 61d7fe5

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

`4eca022`	`Trac #32981: handle "None" object in sage display formatter.`
`e815b16`	`Trac #32981: disable sage0's cputime() in a few doctests.`
`61d7fe5`	`Trac #32981: add a sage0 doctest workaround to our display formatter.`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from 61d7fe5 to 3ae9735

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`801c29d`	`Trac #32981: disable sage0's cputime() in a few doctests.`
`a0d5dc8`	`Trac #32981: add a sage0 doctest workaround to our display formatter.`
`d6ad665`	`Trac #32981: disable cputime in some GAP interface tests.`
`9d532cf`	`Trac #32981: hide the user-facing "Gap crashed!" message.`
`3ae9735`	`Trac #32981: add dummy _strip_prompt methods to gap/gp interfaces.`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from 3ae9735 to 58fc03a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`e816a4d`	`Trac #32981: report doctest success in CPU time.`
`58fc03a`	`Trac #32981: cputime for pexpect interfaces on Linux/BSD.`

orlitzky commented 2 years ago

comment:18

Well, at least I know why Volker chose to use walltime now. Several times yesterday I almost quit and wandered into the woods.

Using the existing cputime(subprocesses=True) isn't going to work. Our pexpect interfaces are flaky, and we have a lot of doctests that check for low-level "incidental" details that appear when using them. Moreover most don't actually implement cputime(). And that's just for standard packages -- the optional ones should be much worse.

I was able to get the tests passing, but only by introducing twice as many WTFs as I discovered. I wouldn't commit such an atrocity, nor do I have the time to properly investigate and fix all of the underlying issues. Instead, I've reverted back to where the branch was two days ago, and added a commit that reliably computes the cputime for all subprocesses.... but only on Linux/BSD. The same information is probably available on macOS, but I don't have it in front of me to poke around.

While platform-specific code is lame, I think this may be a reasonable trade-off, especially if we can get it working on macOS. We only need someone to notice & fix long tests, not everyone. And the missing information can only hurt for (a) pexpect tests, that (b) happen to run long. So it's not like the entire test suite will have incorrect timing information on e.g. Cygwin.

mkoeppe commented 2 years ago

comment:19

If only there were a Python package for that...

orlitzky commented 2 years ago

comment:20

Replying to @mkoeppe:

If only there were a Python package for that...

If you're referring to psutil, it's mainly a C package and consists of nothing BUT platform-specific code. The version we had was almost five years out of date and required a huge unmaintained patch that was rejected entirely by upstream. I'll take the ten lines we have now instead.

Worst case, we can get the information from the OSX kernel with a another five lines of code.

orlitzky commented 2 years ago

Description changed:

--- 
+++ 
@@ -5,7 +5,10 @@
 This ticket addresses two problems:

 1. The timing information is often unavailable. We should still warn about long tests in that case.
-2. The default of 60 CPU-equivalent seconds is much to high when `--long` is not in use.
+2. The default of 60 CPU-equivalent seconds is much too high when `--long` is not in use.

-Towards those ends, the first thing we do is switch the `--warn-long` flag to accept CPU time (as opposed to wall time) as an argument. This makes it independent of the CPU, and allows us to eschew the timing information heuristic. It should also make the warnings much more consistent. Afterwards, we set the default value of `--warn-long` to 10 CPU-seconds when `--long` was not given.
+Both of these issues stem from using the "wall time" to measure how long a test has run. Wall time is inherently unstable; if you're watching a movie, the tests will take longer. On a slower computer, the tests will take longer. Et cetera. The use of timing information from previous runs and the obscenely high `--warn-long` limit partially address that.

+So in targeting our two issues, we first aim to switch the timing reports and `--warn-long` flag to use CPU time instead of wall time. CPU time is more reliable, since it only reports time actually spent working on the problem. (It's still sensitive to things like physical CPU time and `CFLAGS`; those will be addressed in #33022). With CPU time being used, the timings will be much more consistent, allowing us to lower the default value of `--warn-long` to 10 CPU-seconds when `--long` was not given.
+
+The main difficulty is in accounting for CPU time spent in pexpect subprocesses. We don't `wait()` for those and they may not terminate, so the OS's usual methods of retrieving child process statistics don't work for them. At present we have a PID-based solution that works quickly on Linux/BSD, but something similar for macOS would be nice.

orlitzky commented 2 years ago

Work Issues: cputime by PID on macOS

orlitzky commented 2 years ago

comment:22

The information we need can be obtained from proc_pidinfo(pid, PROC_PIDTASKINFO,...) in libproc on macOS. I found a gist that might serve as inspiration:

https://gist.github.com/nguyen-phillip/de66b0ea2144e20ddd844c41c9d93eb9

In theory all we need to do is get a proc_taskinfo struct from that call, and read its pti_total_user and pti_total_system fields. But I don't have a mac to test on so someone else will have to take this step.

mkoeppe commented 2 years ago

comment:23

Replying to @orlitzky:

The version we had was almost five years out of date and required a huge unmaintained patch

We needed the patch for Cygwin only. We could just use an upgraded psutil and do whatever is needed for Cygwin in the Sage library.

orlitzky commented 2 years ago

comment:24

Replying to @mkoeppe:

Replying to @orlitzky:

The version we had was almost five years out of date and required a huge unmaintained patch

We needed the patch for Cygwin only. We could just use an upgraded psutil and do whatever is needed for Cygwin in the Sage library.

Yes but if mkoeppe ever gets hit by a bus, the rest of us will be crushed under 25 hours/day of package upgrades and maintenance. I appreciate all you do, but there's a huge cost to every additional dependency measured over the years.

mkoeppe commented 2 years ago

comment:25

It's a well-maintained package and using it beats maintaining Sage-specific system code ... which is not even written yet ...

orlitzky commented 2 years ago

comment:26

There's a point after which I agree, but assuming that the package itself causes zero problems: every distro needs to package it and keep it upgraded, and every user needs to install it every time they build sage (wasting time, space, and bandwidth), and every upstream release usually gets a ticket->branch->test->review cycle within sage itself.

The total amount of code we're talking about is less than the text files in build/pkgs/psutil contained. It's in a non-critical location in a pseudo-private, non-user-facing function. We'll go years without ever thinking about it.

mkoeppe commented 2 years ago

comment:27

Replying to @orlitzky:

every distro needs to package

https://repology.org/project/python:psutil/versions

orlitzky commented 2 years ago

comment:28

Replying to @mkoeppe:

https://repology.org/project/python:psutil/versions

That's not the hard part. The Gentoo package is available on amd64, arm, arm64, hppa, ppc, ppc64, sparc, x86, alpha, ia64, riscv, m68k, mips, and s390. Every new version has to be tested, and stabilized, and most introduce new bugs that have to be triaged, upstreamed, patched, etc. If psutil is not used by sage, I don't have to care about any of that. If it is, I often have to step in to ensure that the distro package is compatible with that sage expects. Repeat two-hundred-some times on however-many distros and you have a never-ending stream of work.

mkoeppe commented 2 years ago

comment:29

https://gitweb.gentoo.org/repo/gentoo.git/log/dev-python/psutil doesn't look particularly dramatic

orlitzky commented 2 years ago

comment:30

You'd get a better idea from the bug tracker, but I don't know what point we're arguing. Sage was stuck on the same version of psutil for five years. That naturally did not require much maintenance either within sage or without.

mkoeppe commented 2 years ago

comment:31

You were trying to argue that somehow creating new homegrown platform-dependent code is a better strategy than just using an existing upstream project (for which we have no particular version requirements and never had).

orlitzky commented 2 years ago

comment:32

headline: man who doesn't do dishes thinks it's more elegant to use a different spoon for each ravioli

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from 58fc03a to 06209f0

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`ea1d5e5`	`Trac #32981: report both cpu and wall times for long tests.`
`bbe352f`	`Trac #32981: specify wall seconds in per-file doctest report.`
`06209f0`	`Trac #32981: specify wall seconds in sage -t --verbose output.`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`ac22a95`	`Trac #32981: use CPU time for --warn-long.`
`98ac50c`	`Trac #32981: decrease --warn-long limit for non-long tests.`
`071de56`	`Trac #32981: use a faster file to test the doctest framework.`
`66cf8bd`	`Trac #32981: report verbose doctest success in CPU time.`
`ec3aa59`	`Trac #32981: cputime for pexpect interfaces on Linux/BSD.`
`a1f8ae1`	`Trac #32981: report both cpu and wall times for long tests.`
`60fb0ef`	`Trac #32981: specify wall seconds in per-file doctest report.`
`ba458ee`	`Trac #32981: specify wall seconds in sage -t --verbose output.`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from 06209f0 to ba458ee

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`6c13ede`	`Trac #32981: cputime for pexpect interfaces on Linux/BSD.`
`70803a2`	`Trac #32981: report both cpu and wall times for long tests.`
`8203890`	`Trac #32981: specify wall seconds in per-file doctest report.`
`d7d2cff`	`Trac #32981: specify wall seconds in sage -t --verbose output.`

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 2 years ago

Changed commit from ba458ee to d7d2cff

orlitzky commented 2 years ago

Reviewer: https://github.com/sagemath/sage/actions/runs/1992568379

orlitzky commented 2 years ago

comment:36

I ran this on GH but the actions page has been failing to load for a week. I've reported it to GH and a fix is "being worked on." Aside from that, I think it's ready for review.

In the spirit of "do no harm," I have avoided outputting CPU time statistics wherever possible to avoid outputting confusing numbers on macOS. Nevertheless, in context, the lack of fast CPU time for subprocesses on macOS is not a deal-breaker: the only situation where it will be missed is if a subprocess takes a lot of CPU and would normally violate the --warn-long limit, and instead appears to run fast on macOS. But the status quo is that using wall time already makes --warn-long useless; and false negatives are not harmful. If the purpose of --warn-long is to prevent slow tests from being added to sage, then we only need --warn-long to trigger somewhere (e.g. the patchbots), and not everywhere. Overlooking the occasional warning for certain tests on one platform isn't contrary to that goal, and in any case is not worse than what we already have -- no usable warnings at all.

Of course I'll open a follow-up ticket for adding the macOS accounting should anyone care to do it.

sagemath / sage

Use CPU time for --warn-long and a low default limit for non-long tests #32981