inducer commented 1 year ago

Recently observed on bock:

----------------------------------------------- generated xml file: /var/lib/gitlab-runner/builds/zCL2egrE/0/inducer/pytential/test/pytest.xml ------------------------------------------------
==================================================================================== slowest 10 durations =====================================================================================
17940.36s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case3]
3550.70s call     test/test_stokes.py::test_exterior_stokes[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-3]
1737.94s call     test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator5-solution5]
1114.27s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
1094.16s call     test/test_layer_pot_identity.py::test_identity_convergence_slow[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
1025.42s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case2]
821.09s call     test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case9]
605.78s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
574.41s call     test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-2-5-3-False]
469.54s call     test/test_matrix.py::test_build_matrix[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-vector-curve_fn1-42]
=================================================================================== short test summary info ===================================================================================
SKIPPED [1] test_linalg_proxy.py:200: 3d partitioning requires a tree
=========================================================== 2 failed, 233 passed, 1 skipped, 23213 warnings in 22429.75s (6:13:49) ============================================================

appdirs==1.4.4
arraycontext @ git+https://github.com/inducer/arraycontext.git@6f9616f2bfc09f0c6e4c915712427569f3a1f854
attrs==22.2.0
boxtree @ git+https://github.com/inducer/boxtree.git@1dff7111f400b23fdafc9963c0f24a009be23349
cgen==2020.1
codepy==2019.1
colorama==0.4.6
Cython==0.29.33
execnet==1.9.0
genpy==2022.1
gmsh-interop==2021.1.1
immutables==0.19
iniconfig==2.0.0
islpy @ git+https://github.com/inducer/islpy.git@a824f28113693978a06600911df3a7c36fd67f17
loopy @ git+https://github.com/inducer/loopy.git@c590001873dc8bba3374bf54435538245825d375
Mako==1.2.4
MarkupSafe==2.1.2
meshmode @ git+https://github.com/inducer/meshmode.git@ff0f5f9eaeed38b5bf1c20aa24f2e2d2e05f438c
modepy @ git+https://github.com/inducer/modepy.git@15a06582922d2aa026e2706859cea7d05cd0aa0a
mpmath==1.3.0
numpy==1.24.2
packaging==23.0
platformdirs==3.2.0
pluggy==1.0.0
psutil==5.9.4
pybind11==2.10.4
pyfmmlib @ git+https://github.com/inducer/pyfmmlib.git@e7bb3d18c58bc72ff00361b9093716c09368b726
pymbolic @ git+https://github.com/inducer/pymbolic.git@88f205bf98bdee7d89e193e208d147837cb08f1c
pyopencl @ git+https://github.com/inducer/pyopencl.git@95ad30e2d4ec8a1ed31f1f16b9efd94829c8f89b
pyrsistent==0.19.3
-e git+https://gitlab-ci-token:64_tSV1wzs-vzrc43Eov6Di@gitlab.tiker.net/inducer/pytential.git@b63b97965a1a2ef56155dcdc46d0db9fb36e6a24#egg=pytential
pytest==7.2.2
pytest-github-actions-annotate-failures==0.1.8
pytest-xdist==3.2.1
pytools @ git+https://github.com/inducer/pytools.git@56efa1b3b6dbeea414904880efc8f1d7e4fcb8c0
pyvkfft==2023.1.1
recursivenodes==0.2.0
scipy==1.10.1
six==1.16.0
sumpy @ git+https://github.com/inducer/sumpy.git@fa24fef1af53268077cbfeda69c2545330535631
sympy==1.11.1

pip freeze from that run on bock

cc @alexfikl because his test is a winner, accounting for five of those hours

cc @isuruf because we discussed this on Monday

inducer commented 1 year ago

A similar issue may affect meshmode: https://gitlab.tiker.net/inducer/pytato/-/jobs/538649

inducer commented 1 year ago

Before this started happening:

============================= slowest 10 durations =============================
958.88s call     test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case9]
552.57s call     test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-1-7-5-False]
452.24s call     test/test_layer_pot_identity.py::test_identity_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case0]
444.78s call     test/test_cost_model.py::test_cost_model_correctness[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-2-True-False]
444.33s call     test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case0]
424.14s call     test/test_layer_pot_identity.py::test_identity_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-case1]
363.69s call     test/test_layer_pot.py::test_off_surface_eval[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-True]
351.86s call     test/test_layer_pot_eigenvalues.py::test_sphere_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-sumpy-2-3-3]
246.16s call     test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-2-7-5-False]
229.97s call     test/test_cost_model.py::test_cost_model_correctness[<PyOpenCLArrayContext for <pyopencl.Device 'pthread-Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz' on 'Portable Computing Language'>>-3-False-False]
=========================== short test summary info ============================
SKIPPED [1] test_linalg_proxy.py:200: 3d partitioning requires a tree
========= 228 passed, 1 skipped, 18489 warnings in 2603.72s (0:43:23) ==========

inducer commented 1 year ago

Those pip freezes are literally identical.

inducer commented 1 year ago

Recent runs of #195 are affected on Github, too.

inducer commented 1 year ago

Found an old build environment (from March 22, on koelsch, in /var/lib/gitlab-runner/builds/0d8732fb/0/inducer/pytential/test). By date, it should be from before this all started going poorly, according to https://gitlab.tiker.net/inducer/pytential/-/pipelines. Now, unfortunately, the only difference in pip freeze is platformdirs going from 3.1.1 to 3.2.0, which I'm not sure is relevant.

inducer commented 1 year ago

Whatever it is, it's affecting both Conda and bare-venv runs: https://gitlab.tiker.net/inducer/pytential/-/pipelines/409699

alexfikl commented 1 year ago

Hm, just ran the test_linalg_skeletonization test locally and it seems to be doing just fine.

I'm a bit confused by the -case3 at the end of that though, since the test has 4 cases it runs with and 2 are marked as slow: https://github.com/inducer/pytential/blob/b63b97965a1a2ef56155dcdc46d0db9fb36e6a24/test/test_linalg_skeletonization.py#L386-L392

Is the CI running slow tests too all of a sudden?

EDIT: Take some of that back, I also ran it with -m 'not slowtest'. Running -case3 seems to also take a whole lot, but it doesn't explain why it's on the CI to begin with.

Used py-spy top to check out where it is and it seems to be stuck in np.svd when computing the errors. I'm guessing the matrices are just too large..

inducer commented 1 year ago

Still, there might be something to your theory: The slow runs show 236 tests ("2 failed, 233 passed, 1 skipped"), whereas the manageable-time ones show 229 ("228 passed, 1 skipped").

isuruf commented 1 year ago

Slow runs are running only slowtests because of

PYTEST_ADDOPTS: -kslowtest

This used to be,

PYTEST_ADDOPTS: -m 'not slowtest'

Do you know who might be setting that env variable?

alexfikl commented 1 year ago

Do you know who might be setting that env variable?

You're right! Seems to be back to normal on https://gitlab.tiker.net/inducer/pytential/-/jobs/539024. Not sure what's going on there..?

isuruf commented 1 year ago

Recent runs of https://github.com/inducer/pytential/pull/195 are affected on Github, too.

195 is an unrelated issue.

inducer commented 1 year ago

I'm really not sure what happened here, but :shrug: we may not find out. I'll take the mysterious recovery and say we're done here.

Thanks everyone!

inducer / pytential

CI has started taking six+ hours #196

195 is an unrelated issue.