Closed inducer closed 1 year ago
A similar issue may affect meshmode: https://gitlab.tiker.net/inducer/pytato/-/jobs/538649
Before this started happening:
============================= slowest 10 durations =============================
958.88s call test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case9]
552.57s call test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-1-7-5-False]
452.24s call test/test_layer_pot_identity.py::test_identity_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case0]
444.78s call test/test_cost_model.py::test_cost_model_correctness[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-2-True-False]
444.33s call test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case0]
424.14s call test/test_layer_pot_identity.py::test_identity_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case1]
363.69s call test/test_layer_pot.py::test_off_surface_eval[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-True]
351.86s call test/test_layer_pot_eigenvalues.py::test_sphere_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-sumpy-2-3-3]
246.16s call test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-2-7-5-False]
229.97s call test/test_cost_model.py::test_cost_model_correctness[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-3-False-False]
=========================== short test summary info ============================
SKIPPED [1] test_linalg_proxy.py:200: 3d partitioning requires a tree
========= 228 passed, 1 skipped, 18489 warnings in 2603.72s (0:43:23) ==========
Those pip freezes are literally identical.
Recent runs of #195 are affected on Github, too.
Found an old build environment (from March 22, on koelsch, in /var/lib/gitlab-runner/builds/0d8732fb/0/inducer/pytential/test
). By date, it should be from before this all started going poorly, according to https://gitlab.tiker.net/inducer/pytential/-/pipelines. Now, unfortunately, the only difference in pip freeze
is platformdirs
going from 3.1.1
to 3.2.0
, which I'm not sure is relevant.
Whatever it is, it's affecting both Conda and bare-venv runs: https://gitlab.tiker.net/inducer/pytential/-/pipelines/409699
Hm, just ran the test_linalg_skeletonization
test locally and it seems to be doing just fine.
I'm a bit confused by the -case3
at the end of that though, since the test has 4 cases it runs with and 2 are marked as slow:
https://github.com/inducer/pytential/blob/b63b97965a1a2ef56155dcdc46d0db9fb36e6a24/test/test_linalg_skeletonization.py#L386-L392
Is the CI running slow tests too all of a sudden?
EDIT: Take some of that back, I also ran it with -m 'not slowtest'
. Running -case3
seems to also take a whole lot, but it doesn't explain why it's on the CI to begin with.
Used py-spy top
to check out where it is and it seems to be stuck in np.svd
when computing the errors. I'm guessing the matrices are just too large..
Still, there might be something to your theory: The slow runs show 236 tests ("2 failed, 233 passed, 1 skipped"), whereas the manageable-time ones show 229 ("228 passed, 1 skipped").
Slow runs are running only slowtests because of
PYTEST_ADDOPTS: -kslowtest
This used to be,
PYTEST_ADDOPTS: -m 'not slowtest'
Do you know who might be setting that env variable?
Do you know who might be setting that env variable?
You're right! Seems to be back to normal on https://gitlab.tiker.net/inducer/pytential/-/jobs/539024. Not sure what's going on there..?
Recent runs of https://github.com/inducer/pytential/pull/195 are affected on Github, too.
I'm really not sure what happened here, but :shrug: we may not find out. I'll take the mysterious recovery and say we're done here.
Thanks everyone!
Recently observed on bock:
pip freeze from that run on bock
cc @alexfikl because his test is a winner, accounting for five of those hours
cc @isuruf because we discussed this on Monday