sampotter / butterfly

A library for butterfly and hierarchical matrix factorizations.
https://github.com/sampotter/butterfly
Apache License 2.0
5 stars 2 forks source link

Hitting leaf node prematurely in `bf_lbo` on MFEM sphere #16

Closed pbeckman closed 2 months ago

pbeckman commented 3 months ago

For the smallest mesh /h/4/, I stream some eigenpairs and hit a leaf node prematurely?

(lldb) ./bf_lbo --matPath=../../../../butterfly-LBO-models/sphere_meshes/h/4/ --useOctree --tol=1e-3 --freqTreeDepth=2
Process 64268 launched: '/Users/beckman/Documents/Research/NYU/potter-butterfly-GP/code/butterfly-beckman-fork/builddir/examples/lbo/bf_lbo' (arm64)
using FEM matrices loaded from binaries in directory ../../../../butterfly-LBO-models/sphere_meshes/h/4/ (4098 verts)
wrote octree cells to octree_boxes.txt
row tree has depth 6
computed lambda_max = 11533.3 [0.2s]
building frequency tree with depth 2 (k = 8)
streaming full eigenvector matrix
feed: bracket = (-inf, 2.82), num. eigs = 4
- streamed 4 eigs (0.1% of total)
- compressed size:   0.125 MB
- uncompressed size: 0.125 MB
- compression rate:  0.998
feed: bracket = [2.82, 11.26), num. eigs = 4
- streamed 8 eigs (0.2% of total)
- compressed size:   0.251 MB
- uncompressed size: 0.250 MB
- compression rate:  0.998
feed: bracket = [11.26, 25.34), num. eigs = 16
- streamed 24 eigs (0.6% of total)
- compressed size:   0.751 MB
- uncompressed size: 0.750 MB
- compression rate:  0.999
feed: bracket = [25.34, 45.05), num. eigs = 24
Process 64268 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=1, subcode=0x100569610)
    frame #0: 0x0000000100569610 libbutterfly.dylib`bfFacStreamerFeed(facStreamer=0x000000013060dee0, Phi=0x000000013060e000) at fac_streamer.c:447:5
   444 
   445      BF_ASSERT(Psi == NULL);
   446      BF_ASSERT(W == NULL);
-> 447      BF_ASSERT(!bfTreeNodeIsLeaf(rowNode));
   448 
   449      /* Push the children of the current row node onto the stack in
   450       * reverse order so that we traverse `mat` top to bottom */
Target 0: (bf_lbo) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=1, subcode=0x100569610)
  * frame #0: 0x0000000100569610 libbutterfly.dylib`bfFacStreamerFeed(facStreamer=0x000000013060dee0, Phi=0x000000013060e000) at fac_streamer.c:447:5
    frame #1: 0x000000010056f044 libbutterfly.dylib`bfLboFeedFacStreamerNextEigenband(facStreamer=0x000000013060dee0, freqs=0x000000013060acb0, L=0x0000000130704720, M=0x00000001307047d0) at lbo.c:134:3
    frame #2: 0x0000000100003340 bf_lbo`main(argc=5, argv=0x000000016fdfed38) at bf_lbo.c:396:5
    frame #3: 0x000000019d0960e0 dyld`start + 2360
pbeckman commented 3 months ago

You mentioned in your emails that ./bf_lbo was running fine on the MFEM spheres you tried --- are there different parameters I should be using?

sampotter commented 3 months ago

OK, two things.

First, when I ran this test, I got pretty different output than you:

using FEM matrices loaded from binaries in directory ../../../../butterfly-LBO-models/sphere_meshes/h/4/ (4098 verts)
wrote octree cells to octree_boxes.txt
row tree has depth 6
computed lambda_max = 11533.3 [0.7s]
building frequency tree with depth 2 (k = 8)
streaming full eigenvector matrix
feed: bracket = (-inf, 2.82), num. eigs = 4
- streamed 4 eigs (0.1% of total)
- compressed size:   0.125 MB
- uncompressed size: 0.125 MB
- compression rate:  0.998
feed: bracket = [2.82, 11.26), num. eigs = 5
- streamed 9 eigs (0.2% of total)
- compressed size:   0.282 MB
- uncompressed size: 0.281 MB
- compression rate:  0.999
feed: bracket = [11.26, 25.34), num. eigs = 18
- streamed 27 eigs (0.7% of total)
- compressed size:   0.845 MB
- uncompressed size: 0.844 MB
- compression rate:  0.999
feed: bracket = [25.34, 45.05), num. eigs = 25
- streamed 52 eigs (1.3% of total)
- compressed size:   1.600 MB
- uncompressed size: 1.626 MB
- compression rate:  1.016
feed: bracket = [45.05, 70.39), num. eigs = 17
- streamed 69 eigs (1.7% of total)
- compressed size:   2.131 MB
- uncompressed size: 2.157 MB
- compression rate:  1.012
feed: bracket = [70.39, 101.37), num. eigs = 36

I copied and pasted your command-line invocation, so... hmm. Not sure.

Second, despite the differences, this nevertheless hit the same assert.

I fired up gdb and stepped through what's going on and I'm not sure if I would really call it a "bug". Obviously, an assert isn't fun to trip over, but it actually caught a corner case that was unhandled.

For an explanation, recall that we're using an "adaptive" butterfly factorization. When we factorize a block column of $\Phi$, we start with the entire $m \times n_{\mbox{block}}$ matrix, then:

  1. Attempt to compute a low-rank factorization of $\Phi$ [Footnote 1].
  2. If the block has too few rows or columns, just emit the current block and continue.
  3. Check if we can truncate it at all [Footnote 2]. If we can, do so and declare victory.
  4. Otherwise, push down to the children of the space tree node associated with the current subblock and recursively continue from (1).

This recurrence is embodied in the while loop on line 423 of fac_streamer.c. It uses a stack instead of actual functional recursion to ensure that the column block is traversed from top to bottom.

So, when that assert trips, we've reached a point where we have a block that isn't small enough (caught by 2), that we can't truncate (caught by 3), and is a leaf node in the space tree (can't continue the recursion).

The simple thing to do here is to just emit the block and keep moving. But I think it actually is a very useful signal telling us that we should give up. For the problem setup in question, it's a clear indicator that we're trying to compress noise. Doing so would be a waste and we should probably give up streaming entirely.

Let me know if you have any thoughts. I know how to "fix" this, but I want you to wrap your head around what's going on before I do, because it's sort of crucial to how we design our numerical experiments.


Footnote 1: This actually means we end up doing an SVD of the first column block at "the top of the space tree" (maximum number of rows). This is a semi-orthogonal matrix, so this is a total waste. I need to add an option to skip this. It can indeed be useful for some kinds of butterfly factorizations, but not for LBO.

Footnote 2: We should come up with a smarter truncation strategy. We want to save memory. Only dropping a few terms can easily result in the SVD being larger than the original matrix. We need to estimate the number of bytes used and only truncate if it's smaller.

sampotter commented 3 months ago

OK, I did go ahead and add a flag to control this since I had some more time to work. New run:

using FEM matrices loaded from binaries in directory ../../../../butterfly-LBO-models/sphere_meshes/h/4/ (4098 verts)
wrote octree cells to octree_boxes.txt
row tree has depth 6
computed lambda_max = 11533.3 [0.7s]
building frequency tree with depth 2 (k = 8)
streaming full eigenvector matrix
feed: bracket = (-inf, 2.82), num. eigs = 4
- streamed 4 eigs (0.1% of total) in 1.09s
- uncompressed size: 0.125 MB
- total time: 1.09s
- compressed size:   0.125 MB
- compression rate:  0.998
feed: bracket = [2.82, 11.26), num. eigs = 5
- streamed 9 eigs (0.2% of total) in 2.26s
- uncompressed size: 0.281 MB
- total time: 2.26s
- compressed size:   0.282 MB
- compression rate:  0.999
feed: bracket = [11.26, 25.34), num. eigs = 18
- streamed 27 eigs (0.7% of total) in 1.58s
- uncompressed size: 0.844 MB
- total time: 1.58s
- compressed size:   0.845 MB
- compression rate:  0.999
feed: bracket = [25.34, 45.05), num. eigs = 25
- streamed 52 eigs (1.3% of total) in 3.78s
- uncompressed size: 1.626 MB
- total time: 3.79s
- compressed size:   1.600 MB
- compression rate:  1.016
feed: bracket = [45.05, 70.39), num. eigs = 17
- streamed 69 eigs (1.7% of total) in 2.18s
- uncompressed size: 2.157 MB
- total time: 2.18s
- compressed size:   2.131 MB
- compression rate:  1.012
feed: bracket = [70.39, 101.37), num. eigs = 36
- streamed 105 eigs (2.6% of total) in 4.30s
- uncompressed size: 3.283 MB
- total time: 4.31s
* bottomed out!
finished streaming butterfly factorization
- total time: 15.20s
- eigs time: 15.19s (99.9%)

I also updated the timing code a bit and double-checked that it was working correctly. Happy to see that (at least for a small problem) nearly all the time is eaten up by streaming eigenbands.

pbeckman commented 3 months ago

I'm still getting different output (number of eigs per slice mostly) than you, despite double checking that I'm using the most recent meshes... very spooky. But I'm seeing the same macro-level behavior, so I'll kick that can down the line for now.

Thanks for the note explaining this. That all seems sensible. I like the option that lets you fail gracefully, since I agree it's a good indication that you're trying to compress noise.

As you mention, this raises some questions about how to set up numerical experiments, but in terms of understanding this "bug" I would called this resolved.

sampotter commented 2 months ago

This is fixed by one of my recent commits.

FYI, now by default the output will look like this:

~/Dropbox/Research/Butterfly/butterfly/builddir/examples/lbo $ ./bf_lbo --matPath=../lbo_MFEM/sphere_meshes/h/4 --useOctree --tol=1e-3 --freqTreeDepth=2
using FEM matrices loaded from binaries in directory ../lbo_MFEM/sphere_meshes/h/4 (4098 verts)
wrote octree cells to octree_boxes.txt
row tree has depth 6
computed lambda_max = 11533.3 [0.7s]
building frequency tree with depth 2 (k = 8)
streaming the entire eigenvector matrix (PROBABLY A BAD IDEA!)
...

Fix by passing new flag --bailIfBottomedOut:

~/Dropbox/Research/Butterfly/butterfly/builddir/examples/lbo $ ./bf_lbo --matPath=../../../../butterfly-LBO-models/sphere_meshes/h/4/ --useOctree --tol=1e-3 --freqTreeDepth=2 --bailIfBottomedOut
using FEM matrices loaded from binaries in directory ../../../../butterfly-LBO-models/sphere_meshes/h/4/ (4098 verts)
wrote octree cells to octree_boxes.txt
row tree has depth 6
computed lambda_max = 11533.3 [0.7s]
building frequency tree with depth 2 (k = 8)
streaming until we bottom out in the space tree
...
...
...
feed: bracket = [70.39, 101.37), num. eigs = 36
- streamed 105 eigs (2.6% of total) in 4.31s
- uncompressed size: 3.283 MB
- total time: 4.32s
* bottomed out!
finished streaming butterfly factorization
- total time: 16.47s
- eigs time: 16.46s (99.9%)