Github CI: Laplace Dirichlet 3D example occasionally runs into increasing GMRES residuals

inducer commented 3 years ago

Like this:

 -----------------------------------------------------------------------
RUNNING ./laplace-dirichlet-3d.py
-----------------------------------------------------------------------
IT        0 1.55434597e+01
IT        1 6.94270121e+00
IT        2 3.52489688e+00
IT        3 3.36265158e+00
IT        4 2.97770742e+00
IT        5 2.72544530e+00
IT        6 2.68395559e+00
IT        7 2.59318187e+00
IT        8 2.59154516e+00
IT        9 2.40918314e+00
IT       10 1.30052827e+02
/home/runner/work/pytential/pytential/pytential/symbolic/primitives.py:1571: UserWarning: specified the name of the direction vector
  return DirectionalSourceDerivative(
/home/runner/work/pytential/pytential/.miniforge3/envs/testing/lib/python3.8/site-packages/sumpy/kernel.py:1020: UserWarning: specified the name of the direction vector
  return type(kernel)(
/home/runner/work/pytential/pytential/.miniforge3/envs/testing/lib/python3.8/site-packages/loopy/check.py:251: LoopyWarning: in kernel p2p_from_csr: Found unused inames in kernel: frozenset({'idim'}) Unused inames during linearization will be prohibited in Loopy version 2021.X. (add 'unused_inames' to silenced_warnings kernel attribute to disable)
  warn_with_kernel(
/home/runner/work/pytential/pytential/.miniforge3/envs/testing/lib/python3.8/site-packages/loopy/check.py:251: LoopyWarning: in kernel p2qbxl_from_csr: Found unused inames in kernel: frozenset({'idim'}) Unused inames during linearization will be prohibited in Loopy version 2021.X. (add 'unused_inames' to silenced_warnings kernel attribute to disable)
  warn_with_kernel(
/home/runner/work/pytential/pytential/.miniforge3/envs/testing/lib/python3.8/site-packages/sumpy/kernel.py:1020: UserWarning: specified the name of the direction vector
  return type(kernel)(
Traceback (most recent call last):
  File "laplace-dirichlet-3d.py", line 179, in <module>
    main()
  File "laplace-dirichlet-3d.py", line 134, in main
    gmres_result = gmres(
  File "/home/runner/work/pytential/pytential/pytential/solve.py", line 293, in gmres
    result = _gmres(op, rhs, restart=restart, tol=tol, x0=x0,
  File "/home/runner/work/pytential/pytential/pytential/solve.py", line 170, in _gmres
    raise GMRESError(state)
pytential.solve.GMRESError: non-monotonic residuals

I've only observed this on Github, not Gitlab. This may indicate that it's less likely to happen when more cores are available, i.e. possibly some synchronization thing.

Here's a run that exhibits this: https://github.com/inducer/pytential/runs/1832875842?check_suite_focus=true

As far as I remember, this has been happening for a while. I think I have an email from Jan 16 where this occurred, potentially even earlier. Delinquently, I've often just rerun the CI job which typically resolved the failure, until it would reoccur on a subsequent run.

The recent failures under discussion in #56 are distinct IMO.

inducer commented 3 years ago

I just ran that test over and over on dunkel (with 5525398), while simulating a resource-constrained environment:

while true; do rm-vis; POCL_MAX_PTHREAD_COUNT=4 PYOPENCL_TEST=port numactl -N 0,1 python ./laplace-dirichlet-3d.py || break ; done

It ran around 100 times, without ever encountering the issue.

inducer commented 3 years ago

Ran the same command on a conda installation (which should be the same as what the Github CIs run?). It ran for about seven hours without crashing once.

inducer commented 3 years ago

In #61, I'm trying to use the (very slick-looking, actually) tmate (actually, their Github Action) to try and get a shell on the runner, to see what's happening.

inducer commented 3 years ago

I'm now running

i=0; while true; do i=$((i+1)); echo "RUN NUMBER $i";rm -f *vtu *vts; PYOPENCL_TEST=port python ./laplace-dirichlet-3d.py || break ; done

interactively on the Azure machine behind Github actions, and the test has run successfully ~10 times so far. (which doesn't match the stats otherwise) Maybe this is somehow dependent on the processor type? This one is a Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz.

inducer commented 3 years ago

ci-support will now log the CPU type: https://github.com/inducer/ci-support/commit/610dcc4376ce290fb17f851bc21eba464ce27205

inducer commented 3 years ago

OK, so much for the theory of being system-dependent.

My interactive run has now finished 70 times without any hiccup. (on an "Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz")
https://github.com/inducer/pytential/runs/1947518681?check_suite_focus=true shows a run (on an "Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz", i.e. the same thing unless I'm misreading something) that fails right away.

inducer commented 3 years ago

I think I've found a way to reproduce the problem somewhat reliably. Run the examples build on #61 and then

bash # start a new shell so that the tmux window doesn't close when an error occurs
(export PYOPENCL_TEST=port:pthread; source .miniforge3/bin/activate testing ;  source ci-support.sh ; run_examples)

inducer commented 3 years ago

Interesting: Do the same thing multiple times, get different results:

diff --git a/examples/laplace-dirichlet-3d.py b/examples/laplace-dirichlet-3d.py
index 48a737a..865b1c3 100644
--- a/examples/laplace-dirichlet-3d.py
+++ b/examples/laplace-dirichlet-3d.py
@@ -130,9 +130,20 @@ def main(mesh_name="torus", visualize=False):

     bvp_rhs = bind(places, sqrt_w*sym.var("bc"))(actx, bc=bc)

+    spop = bound_op.scipy_op(actx, "sigma", dtype=np.float64)
+
+    from meshmode.dof_array import flat_norm
+    for i in range(10):
+        res = spop.matvec(bvp_rhs)
+        print(i, actx.np.linalg.norm(res))
+    1/0
+
+
+
+
     from pytential.solve import gmres
     gmres_result = gmres(
-            bound_op.scipy_op(actx, "sigma", dtype=np.float64),
+            spop,
             bvp_rhs, tol=1e-14, progress=True,
             stall_iterations=0,
             hard_failure=True)

gives

0 18271.189240367086
1 18266.885665550613
2 18284.471850638434
3 18289.173559412557
4 18289.142404135426
5 18284.471850638434
6 18284.471850638434
7 18306.65272297942
8 18306.65272297942
9 18290.04129860244

For context: this prints the norm of what should be one and the same vector.

inducer commented 3 years ago

This diff still yields different matvec results each time the FMM is run:

diff --git a/examples/laplace-dirichlet-3d.py b/examples/laplace-dirichlet-3d.py
index 48a737a..3148440 100644
--- a/examples/laplace-dirichlet-3d.py
+++ b/examples/laplace-dirichlet-3d.py
@@ -130,9 +130,20 @@ def main(mesh_name="torus", visualize=False):

     bvp_rhs = bind(places, sqrt_w*sym.var("bc"))(actx, bc=bc)

+    spop = bound_op.scipy_op(actx, "sigma", dtype=np.float64)
+
+    from meshmode.dof_array import flat_norm
+    for i in range(10):
+        res = spop.matvec(bvp_rhs)
+        print(i, actx.np.linalg.norm(bvp_rhs), actx.np.linalg.norm(res))
+    1/0
+
+
+
+
     from pytential.solve import gmres
     gmres_result = gmres(
-            bound_op.scipy_op(actx, "sigma", dtype=np.float64),
+            spop,
             bvp_rhs, tol=1e-14, progress=True,
             stall_iterations=0,
             hard_failure=True)
diff --git a/pytential/qbx/fmm.py b/pytential/qbx/fmm.py
index a58524b..4118a22 100644
--- a/pytential/qbx/fmm.py
+++ b/pytential/qbx/fmm.py
@@ -543,23 +543,23 @@ def drive_fmm(expansion_wrangler, src_weight_vecs, timing_data=None,

     recorder.add("translate_box_multipoles_to_qbx_local", timing_future)

-    qbx_expansions = qbx_expansions + local_result
+    qbx_expansions = local_result

     local_result, timing_future = (
             wrangler.translate_box_local_to_qbx_local(local_exps))

     recorder.add("translate_box_local_to_qbx_local", timing_future)

-    qbx_expansions = qbx_expansions + local_result
+    #qbx_expansions = local_result

     qbx_potentials, timing_future = wrangler.eval_qbx_expansions(qbx_expansions)

     recorder.add("eval_qbx_expansions", timing_future)

-    ts_result, timing_future = \
-        wrangler.eval_target_specific_qbx_locals(src_weight_vecs)
+    #ts_result, timing_future = \
+    #    wrangler.eval_target_specific_qbx_locals(src_weight_vecs)

-    qbx_potentials = qbx_potentials + ts_result
+    #qbx_potentials = qbx_potentials + ts_result

     recorder.add("eval_target_specific_qbx_locals", timing_future)

This suggests that m2qbxl may have a role to play in this. The results of the remainder of the FMM don't seem to "wobble".

inducer commented 3 years ago

It's the sumpy backend that has the issue. Both the single- and double-layer kernels exhibit the problem.

inducer commented 3 years ago

diff --git a/examples/laplace-dirichlet-3d.py b/examples/laplace-dirichlet-3d.py
index 48a737a..1b57a80 100644
--- a/examples/laplace-dirichlet-3d.py
+++ b/examples/laplace-dirichlet-3d.py
@@ -102,7 +102,7 @@ def main(mesh_name="torus", visualize=False):
     bdry_op_sym = (loc_sign*0.5*sigma_sym
             + sqrt_w*(
                 sym.S(kernel, inv_sqrt_w_sigma, qbx_forced_limit=+1)
-                + sym.D(kernel, inv_sqrt_w_sigma, qbx_forced_limit="avg")
+                #sym.D(kernel, inv_sqrt_w_sigma, qbx_forced_limit="avg")
                 ))

     # }}}
@@ -130,9 +130,20 @@ def main(mesh_name="torus", visualize=False):

     bvp_rhs = bind(places, sqrt_w*sym.var("bc"))(actx, bc=bc)

+    spop = bound_op.scipy_op(actx, "sigma", dtype=np.float64)
+
+    from meshmode.dof_array import flat_norm
+    for i in range(10):
+        res = spop.matvec(bvp_rhs)
+        print(i, actx.np.linalg.norm(bvp_rhs), actx.np.linalg.norm(res))
+    1/0
+
+
+
+
     from pytential.solve import gmres
     gmres_result = gmres(
-            bound_op.scipy_op(actx, "sigma", dtype=np.float64),
+            spop,
             bvp_rhs, tol=1e-14, progress=True,
             stall_iterations=0,
             hard_failure=True)
diff --git a/pytential/qbx/fmm.py b/pytential/qbx/fmm.py
index a58524b..63967a8 100644
--- a/pytential/qbx/fmm.py
+++ b/pytential/qbx/fmm.py
@@ -284,6 +284,7 @@ non_qbx_box_target_lists`),
             events.append(evt)
             wait_for = [evt]
             assert qbx_expansions_res is qbx_expansions
+            print(isrc_level, np.sum(qbx_expansions.get()))

         qbx_expansions.add_event(evt)

along with the changes above restricting qbx_expansions to just M2QBXL gives

0 0.0
1 0.0
2 0.0
3 250517.75028195637
4 2515898.6674050163
0 15.543459679529555 12101.463403771833
0 0.0
1 0.0
2 0.0
3 250517.75028195637
4 2492680.9811040373
1 15.543459679529555 12073.09081650059
0 0.0
1 0.0
2 0.0
3 250517.75028195637
4 2515898.6674050163
2 15.543459679529555 12183.17744099083
0 0.0
1 0.0
2 0.0
3 250517.75028195637
4 2492680.9811040373
3 15.543459679529555 12073.09081650059

Two observations:

While the sums of qbx_expansions appear stable at level 3, level 4 "wobbles".
Even when the same values appear for the level 4 sum of qbx_expansions, the norm of the result vector can still have additional "wobbles".

inducer commented 3 years ago

Found something! Turns out the generated code for m2qbxl contains this gem:

  for (int icenter_inner = 0; icenter_inner <= (-16 + ncenters + -16 * gid(0) >= 0 ? 15 : -1 + ncenters + -16 * gid(0)); ++icenter_inner)
  {
    if (icontaining_tgt_box != -1)
    {
      acc_isrc_box = 0.0;
      acc_isrc_box_12 = 0.0;
      acc_isrc_box_4 = 0.0;
      acc_isrc_box_0 = 0.0;
    }
    icontaining_tgt_box = qbx_center_to_target_box_source_level[16 * gid(0) + icenter_inner];
    if (icontaining_tgt_box != -1)
    {
      acc_isrc_box_3 = 0.0;
      acc_isrc_box_9 = 0.0;
      acc_isrc_box_21 = 0.0;
      isrc_stop = src_box_starts[icontaining_tgt_box + 1];
      tgt_rscale = qbx_expansion_radii[16 * gid(0) + icenter_inner];
      isrc_start = src_box_starts[icontaining_tgt_box];
      acc_isrc_box_19 = 0.0;

Observe how icontaining_tgt_box is being used in the conditional before it is even being set :facepalm:.

It's almost hard to understand how this code worked for anyone, anywhere... In addition to fixing this code generation, we also need to make it harder to use loopy in a way that ends up with broken code like this.

inducer commented 3 years ago

https://github.com/inducer/loopy/pull/235 should address this.

inducer / pytential

Github CI: Laplace Dirichlet 3D example occasionally runs into increasing GMRES residuals #57