Open inducer opened 1 year ago
For context, this happened because I was working on https://github.com/inducer/pytato/pull/393 and had a need to have the persistent hash key builder just "understand" expression values. Loopy has some absurd hack job to make this possible, but I wasn't itching to replicate it in pytato. This seemed nicer.
Generally seems like a good idea to me!
One thing that doesn't quite spark joy is having to remember which classes are dataclass
and which are attrs
in more places :( But it's hard to argue with the hash caching. I found the hash to show up in quite a few profiles, so making that fast would have priority for me too.
I'm guessing I may revert this to dataclasses, for exactly this reason. Not sure.
I've changed this to dataclasses
, and the tests in pymbolic itself seem to be passing. We'll see how the downstreams do.
Most everything seems to be working, except pytential has a puzzling failure.
Thanks for finding the smaller reproducer! That helped quite a bit. The issue was the missing rewriting from None
to cse_scope.EVALUATION
in CommonSubexpression
. It also helped expose https://github.com/inducer/pytential/issues/209, which, as usual, the one issue you thought you had, is actually two. :slightly_smiling_face:
Super weird. I wonder why the pytential run is getting canceled. Out of memory for some reason? If so, why would this PR make a difference?
Identical PR on Gitlab: https://gitlab.tiker.net/inducer/pymbolic/-/merge_requests/64
Tried to investigate more. Here's the time profile of the pytential tests, first with this patch:
============================================================================================================ slowest 10 durations =============================================================================================================
1357.39s call test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case3]
1355.53s call test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator7-solution7]
914.25s call test/test_stokes.py::test_exterior_stokes[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-3]
746.09s call test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator6-solution6]
724.49s call test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case9]
685.49s call test/test_layer_pot_identity.py::test_identity_convergence_slow[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
640.87s call test/test_maxwell.py::test_pec_mfie_extinction[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
615.13s call test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
499.75s call test/test_layer_pot_identity.py::test_identity_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case1]
264.22s call test/test_layer_pot_eigenvalues.py::test_ellipse_eigenvalues[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-1-5-3-False]
=========================================================================================================== short test summary info ===========================================================================================================
SKIPPED [1] test_linalg_proxy.py:200: 3d partitioning requires a tree
========================================================================================= 237 passed, 1 skipped, 16198 warnings in 4015.96s (1:06:55) =========================================================================================
and then without:
==================================================================================================================================================================================================================================== slowest 10 durations =====================================================================================================================================================================================================================================
2185.13s call test/test_stokes.py::test_exterior_stokes[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-3]
2146.56s call test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case3]
1560.74s call test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator7-solution7]
912.97s call test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator6-solution6]
786.50s call test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
631.47s call test/test_maxwell.py::test_pec_mfie_extinction[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case0]
421.05s call test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case9]
265.36s call test/test_beltrami.py::test_beltrami_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-operator5-solution5]
231.81s call test/test_linalg_skeletonization.py::test_skeletonize_by_proxy_convergence[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case1]
188.92s call test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz' on 'Portable Computing Language'>>-case10]
=================================================================================================================================================================================================================================== short test summary info ===================================================================================================================================================================================================================================
SKIPPED [1] test_linalg_proxy.py:200: 3d partitioning requires a tree
================================================================================================================================================================================================================= 237 passed, 1 skipped, 1916 warnings in 3817.94s (1:03:37) =
(run on tripel, back to back with https://github.com/inducer/pytential/commit/488b958f2a951141d91fa59e598c617c43144c36 and https://github.com/inducer/sumpy/commit/8960662fcc4fb8cb89ce7767d03948a29b562209)
The differences are... interesting. I'm not sure I understand them.
Tried to investigate more.
Ran this on koelsch
as well and didn't see as much of a toss up between the timings of the various tests. The total runtime was pretty similar like in your run though. I also did a (in pytential
)
mprof run python -m pytest -v -s -k 'not slowtest' --durations=25
to get the memory and it seems pretty similar as well (blue is on main
and black is on attrs
branches).
Sooo.. really not sure why the github CI is crashing..
EDIT: Not sure how accurate this is, but Github reports having 7GB on their machines (and above we're using 12GB) https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources
In a way I think it's not a surprise that the runners get canceled. They have about 7G of memory available, plus 4G swap, so the 12G peak allocation is definitely flirting with disaster, and I think which one gets cancelled is probably just up to coincidence. Perhaps the more useful question here might be why the curve is monotonically increasing, when in reality all tests should be independent, and so there shouldn't be an increase.
I propose we use https://github.com/inducer/sumpy/issues/178 to have this discussion, because that looks eerily related.
OK, so https://github.com/inducer/sumpy/issues/178 seems to be a different issue.
Made https://github.com/inducer/pytential/issues/213 to discuss the unbounded memory growth.
Just wanted to write this down before I forget it again! After a run on koelsch
the pytential
test
test/test_scalar_int_eq.py::test_integral_equation[<PyOpenCLArrayContext for <pyopencl.Device 'cpu-Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz' on 'Portable Computing Lan
guage'>>-case9]
uses
pymbolic@main
pymbolic@attrs
so that may be a good candidate to look at to see what's ballooning the memory.
Also slightly related: tried hacking in __slots__
into all the Expression
s from pymbolic, since that should reduce memory a bit, but the CI still didn't pass so I didn't bother looking into it further..
Surprisingly, this seems to somewhat work. (Current CI notwithstanding) @kaushikcfd @alexfikl @isuruf What do you think?