Closed orlitzky closed 10 months ago
Volker, any idea why we used wall time for the limit rather than CPU time? It's making it a bit awkward to pick a multiplier when no timing information is available (which is usually the case for me).
Branch: u/mjo/ticket/32981
Author: Michael Orlitzky
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
2959d09 | Trac #32981: decrease --warn-long limit for non-long tests. |
This is a bit too lenient IMO... allowing 15s for a non-long test when no timing information is available. Still, it's an improvement.
I'm just going to put in the effort to switch this to CPU time, which makes a lot more sense for a time measurement that needs to be consistent across machines.
Description changed:
---
+++
@@ -1,6 +1,11 @@
-As discussed in #32973, the default value of `--warn-long` is roughly a minute, independent of the `--long` flag. Our developers' guide suggests that even a "long time" test should complete in under five seconds,
+As discussed in #32973, the default value of `--warn-long` is roughly a minute, independent of the `--long` flag, and is only enabled when timing information from previous test runs is available. Our developers' guide suggests that even a "long time" test should complete in under five seconds:
https://doc.sagemath.org/html/en/developer/doctesting.html#optional-arguments
-so I think it's fair to warn about non-long-time tests that take longer than (say) ten seconds.
+This ticket addresses two problems:
+1. The timing information is often unavailable. We should still warn about long tests in that case.
+2. The default of 60 wall-time seconds is much to high when `--long` is not in use.
+
+Towards those ends, the first thing we do is switch the `--warn-long` flag to accept CPU time (as opposed to wall time) as an argument. This makes it independent of the CPU, and allows us to eschew the timing information heuristic. It should also make the warnings much more consistent. Afterwards, we set the default value of `--warn-long` to 10 CPU-seconds when `--long` was not given.
+
Description changed:
---
+++
@@ -5,7 +5,7 @@
This ticket addresses two problems:
1. The timing information is often unavailable. We should still warn about long tests in that case.
-2. The default of 60 wall-time seconds is much to high when `--long` is not in use.
+2. The default of 60 CPU-equivalent seconds is much to high when `--long` is not in use.
Towards those ends, the first thing we do is switch the `--warn-long` flag to accept CPU time (as opposed to wall time) as an argument. This makes it independent of the CPU, and allows us to eschew the timing information heuristic. It should also make the warnings much more consistent. Afterwards, we set the default value of `--warn-long` to 10 CPU-seconds when `--long` was not given.
One reason for using wall time instead of CPU time might be that it works better for computations that use external processes like Pexpect:
sage: R = PolynomialRing(GF(101), 'x', 8)
....: J = sage.rings.ideal.Cyclic(R)
....: %time gb = J.groebner_basis(algorithm='singular')
CPU times: user 917 ms, sys: 52.6 ms, total: 969 ms
Wall time: 33.2 s
Maybe measure both and pay attention to wall time only if it is much longer than CPU time. I don't know if "much longer" should be a raw time (5 seconds longer) or a multiple (10 times as long). We want informative messages, so this doesn't have to be flawless.
Well.. that's embarrassing. I'll have to think about it.
Oh, we just have to pass subprocesses=True
inside the doctest timer. I guess nobody noticed until now because the cputime wasn't reported anyway. I'll update the branch once I'm sure it works correctly.
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
a5a65c0 | Trac #32981: use CPU time for --warn-long. |
e45130b | Trac #32981: decrease --warn-long limit for non-long tests. |
d204ab4 | Trac #32981: use a faster file to test the doctest framework. |
9707639 | Trac #32981: collect cputime for subprocesses while doctesting. |
5c29f59 | Trac #32981: report doctest success in CPU time. |
3764197 | Trac #32981: account for subprocesses in sage0 interface's cputime(). |
d869a09 | Trac #32981: don't pass "None" to cputime() in sage0 interface. |
That fixed the immediate problem, but caused several more. Doctesting the doctest framework using the sage pexpect interface to itself is pretty fragile, who would've thought...
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
801c29d | Trac #32981: disable sage0's cputime() in a few doctests. |
a0d5dc8 | Trac #32981: add a sage0 doctest workaround to our display formatter. |
d6ad665 | Trac #32981: disable cputime in some GAP interface tests. |
9d532cf | Trac #32981: hide the user-facing "Gap crashed!" message. |
3ae9735 | Trac #32981: add dummy _strip_prompt methods to gap/gp interfaces. |
Well, at least I know why Volker chose to use walltime now. Several times yesterday I almost quit and wandered into the woods.
Using the existing cputime(subprocesses=True)
isn't going to work. Our pexpect interfaces are flaky, and we have a lot of doctests that check for low-level "incidental" details that appear when using them. Moreover most don't actually implement cputime()
. And that's just for standard packages -- the optional ones should be much worse.
I was able to get the tests passing, but only by introducing twice as many WTFs as I discovered. I wouldn't commit such an atrocity, nor do I have the time to properly investigate and fix all of the underlying issues. Instead, I've reverted back to where the branch was two days ago, and added a commit that reliably computes the cputime for all subprocesses.... but only on Linux/BSD. The same information is probably available on macOS, but I don't have it in front of me to poke around.
While platform-specific code is lame, I think this may be a reasonable trade-off, especially if we can get it working on macOS. We only need someone to notice & fix long tests, not everyone. And the missing information can only hurt for (a) pexpect tests, that (b) happen to run long. So it's not like the entire test suite will have incorrect timing information on e.g. Cygwin.
If only there were a Python package for that...
Replying to @mkoeppe:
If only there were a Python package for that...
If you're referring to psutil, it's mainly a C package and consists of nothing BUT platform-specific code. The version we had was almost five years out of date and required a huge unmaintained patch that was rejected entirely by upstream. I'll take the ten lines we have now instead.
Worst case, we can get the information from the OSX kernel with a another five lines of code.
Description changed:
---
+++
@@ -5,7 +5,10 @@
This ticket addresses two problems:
1. The timing information is often unavailable. We should still warn about long tests in that case.
-2. The default of 60 CPU-equivalent seconds is much to high when `--long` is not in use.
+2. The default of 60 CPU-equivalent seconds is much too high when `--long` is not in use.
-Towards those ends, the first thing we do is switch the `--warn-long` flag to accept CPU time (as opposed to wall time) as an argument. This makes it independent of the CPU, and allows us to eschew the timing information heuristic. It should also make the warnings much more consistent. Afterwards, we set the default value of `--warn-long` to 10 CPU-seconds when `--long` was not given.
+Both of these issues stem from using the "wall time" to measure how long a test has run. Wall time is inherently unstable; if you're watching a movie, the tests will take longer. On a slower computer, the tests will take longer. Et cetera. The use of timing information from previous runs and the obscenely high `--warn-long` limit partially address that.
+So in targeting our two issues, we first aim to switch the timing reports and `--warn-long` flag to use CPU time instead of wall time. CPU time is more reliable, since it only reports time actually spent working on the problem. (It's still sensitive to things like physical CPU time and `CFLAGS`; those will be addressed in #33022). With CPU time being used, the timings will be much more consistent, allowing us to lower the default value of `--warn-long` to 10 CPU-seconds when `--long` was not given.
+
+The main difficulty is in accounting for CPU time spent in pexpect subprocesses. We don't `wait()` for those and they may not terminate, so the OS's usual methods of retrieving child process statistics don't work for them. At present we have a PID-based solution that works quickly on Linux/BSD, but something similar for macOS would be nice.
Work Issues: cputime by PID on macOS
The information we need can be obtained from proc_pidinfo(pid, PROC_PIDTASKINFO,...)
in libproc on macOS. I found a gist that might serve as inspiration:
https://gist.github.com/nguyen-phillip/de66b0ea2144e20ddd844c41c9d93eb9
In theory all we need to do is get a proc_taskinfo
struct from that call, and read its pti_total_user
and pti_total_system
fields. But I don't have a mac to test on so someone else will have to take this step.
Replying to @orlitzky:
The version we had was almost five years out of date and required a huge unmaintained patch
We needed the patch for Cygwin only. We could just use an upgraded psutil and do whatever is needed for Cygwin in the Sage library.
Replying to @mkoeppe:
Replying to @orlitzky:
The version we had was almost five years out of date and required a huge unmaintained patch
We needed the patch for Cygwin only. We could just use an upgraded psutil and do whatever is needed for Cygwin in the Sage library.
Yes but if mkoeppe ever gets hit by a bus, the rest of us will be crushed under 25 hours/day of package upgrades and maintenance. I appreciate all you do, but there's a huge cost to every additional dependency measured over the years.
It's a well-maintained package and using it beats maintaining Sage-specific system code ... which is not even written yet ...
There's a point after which I agree, but assuming that the package itself causes zero problems: every distro needs to package it and keep it upgraded, and every user needs to install it every time they build sage (wasting time, space, and bandwidth), and every upstream release usually gets a ticket->branch->test->review cycle within sage itself.
The total amount of code we're talking about is less than the text files in build/pkgs/psutil
contained. It's in a non-critical location in a pseudo-private, non-user-facing function. We'll go years without ever thinking about it.
Replying to @mkoeppe:
That's not the hard part. The Gentoo package is available on amd64, arm, arm64, hppa, ppc, ppc64, sparc, x86, alpha, ia64, riscv, m68k, mips, and s390. Every new version has to be tested, and stabilized, and most introduce new bugs that have to be triaged, upstreamed, patched, etc. If psutil is not used by sage, I don't have to care about any of that. If it is, I often have to step in to ensure that the distro package is compatible with that sage expects. Repeat two-hundred-some times on however-many distros and you have a never-ending stream of work.
https://gitweb.gentoo.org/repo/gentoo.git/log/dev-python/psutil doesn't look particularly dramatic
You'd get a better idea from the bug tracker, but I don't know what point we're arguing. Sage was stuck on the same version of psutil for five years. That naturally did not require much maintenance either within sage or without.
You were trying to argue that somehow creating new homegrown platform-dependent code is a better strategy than just using an existing upstream project (for which we have no particular version requirements and never had).
headline: man who doesn't do dishes thinks it's more elegant to use a different spoon for each ravioli
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
ac22a95 | Trac #32981: use CPU time for --warn-long. |
98ac50c | Trac #32981: decrease --warn-long limit for non-long tests. |
071de56 | Trac #32981: use a faster file to test the doctest framework. |
66cf8bd | Trac #32981: report verbose doctest success in CPU time. |
ec3aa59 | Trac #32981: cputime for pexpect interfaces on Linux/BSD. |
a1f8ae1 | Trac #32981: report both cpu and wall times for long tests. |
60fb0ef | Trac #32981: specify wall seconds in per-file doctest report. |
ba458ee | Trac #32981: specify wall seconds in sage -t --verbose output. |
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
6c13ede | Trac #32981: cputime for pexpect interfaces on Linux/BSD. |
70803a2 | Trac #32981: report both cpu and wall times for long tests. |
8203890 | Trac #32981: specify wall seconds in per-file doctest report. |
d7d2cff | Trac #32981: specify wall seconds in sage -t --verbose output. |
I ran this on GH but the actions page has been failing to load for a week. I've reported it to GH and a fix is "being worked on." Aside from that, I think it's ready for review.
In the spirit of "do no harm," I have avoided outputting CPU time statistics wherever possible to avoid outputting confusing numbers on macOS. Nevertheless, in context, the lack of fast CPU time for subprocesses on macOS is not a deal-breaker: the only situation where it will be missed is if a subprocess takes a lot of CPU and would normally violate the --warn-long
limit, and instead appears to run fast on macOS. But the status quo is that using wall time already makes --warn-long
useless; and false negatives are not harmful. If the purpose of --warn-long
is to prevent slow tests from being added to sage, then we only need --warn-long
to trigger somewhere (e.g. the patchbots), and not everywhere. Overlooking the occasional warning for certain tests on one platform isn't contrary to that goal, and in any case is not worse than what we already have -- no usable warnings at all.
Of course I'll open a follow-up ticket for adding the macOS accounting should anyone care to do it.
As discussed in #32973, the default value of
--warn-long
is roughly a minute, independent of the--long
flag, and is only enabled when timing information from previous test runs is available. Our developers' guide suggests that even a "long time" test should complete in under five seconds:https://doc.sagemath.org/html/en/developer/doctesting.html#optional-arguments
This ticket addresses two problems:
--long
is not in use.Both of these issues stem from using the "wall time" to measure how long a test has run. Wall time is inherently unstable; if you're watching a movie, the tests will take longer. On a slower computer, the tests will take longer. Et cetera. The use of timing information from previous runs and the obscenely high
--warn-long
limit partially address that.So in targeting our two issues, we first aim to switch the timing reports and
--warn-long
flag to use CPU time instead of wall time. CPU time is more reliable, since it only reports time actually spent working on the problem. (It's still sensitive to things like physical CPU time andCFLAGS
; those will be addressed in #33022). With CPU time being used, the timings will be much more consistent, allowing us to lower the default value of--warn-long
to 10 CPU-seconds when--long
was not given.The main difficulty is in accounting for CPU time spent in pexpect subprocesses. We don't
wait()
for those and they may not terminate, so the OS's usual methods of retrieving child process statistics don't work for them. At present we have a PID-based solution that works quickly on Linux/BSD, but something similar for macOS would be nice.CC: @vbraun @tornaria
Component: doctest framework
Work Issues: cputime by PID on macOS
Author: Michael Orlitzky
Branch/Commit: u/mjo/ticket/32981 @
d7d2cff
Reviewer: https://github.com/sagemath/sage/actions/runs/1992568379
Issue created by migration from https://trac.sagemath.org/ticket/32981