weird artifact in T results at LSP

steo85it commented 3 years ago

Hi Sam, while testing my "outer mesh" stuff for ray tracing (as discussed last week), I found some weird artifacts in the T results. The "2x2 grid" appears both in my tests and using your lsp_spice "base example" (before my mods, too), both with quad and octree. Did you ever notice this? It's also visible on T (obviously) but not on the E plot. This particular output results from selecting i0, i1 = 929, 1831 j0, j1 = 500, 1435 from lunar_south_pole_80mpp_curvature.grd and

utc0 = '2021 FEB 15 00:00:00.00'
utc1 = '2021 MAR 15 00:00:00.00'

et0 = spice.str2et(utc0)
et1 = spice.str2et(utc1)
stepet = 3*24.*3600
nbet = int(np.ceil((et1 - et0) / stepet))
et = np.linspace(et0, et1, nbet, endpoint=False)

I have several examples of this happening. Also, I checked if anything would change when removing the if not self._blocks[i, j].is_empty_leaf from compressed_form_factors.py (just in case, as it is a recent change), but it doesn't seem to be related to it.

Still not 100% sure that it's not related to something I did, but your base example seems to present the same kind of issue...

sampotter commented 3 years ago

Yeah, I've definitely noticed this happen before. I am still not totally sure what the cause of it is. I have a couple guesses.

First, what tolerance are you using when you assemble the form factor matrix? I still need to change the default, but I would recommend using np.finfo(np.float32).resolution or np.finfo(np.float32).eps.

steo85it commented 3 years ago

I tried with both 1.e-5 and 1.e-7, but nothing changed. Not sure if it happens mostly in "dark areas" or if it's just more visible there because of the color scale.

sampotter commented 3 years ago

If it isn't too much trouble, do you think you make the following plots and attach them here?

Same as the plot in your first post, except computed using the uncompressed form factor matrix.
The difference between these two plots: "uncompressed T in shadow" - "compressed T in shadow".

And each with colorbars.

steo85it commented 3 years ago

Sure, getting to it. ;)

sampotter commented 3 years ago

Attaching a video which shows this problem a bit more clearly.

lsp_T.avi.zip

Here is a still:

lsp_T_052

Relevant parameters in lsp_make_shape_model.py:

i0, i1 = 2425, 2525
j0, j1 = 1912, 2012

and in lsp_spice.py:

et0 = spice.str2et('2011 MAR 01 00:00:00.00')
et1 = spice.str2et('2011 APR 01 00:00:00.00')
et = np.linspace(et0, et1, 100, endpoint=False)

also changes the plotting commands to:

plt.figure(figsize=(8, 8))
plt.tripcolor(*V[:, :2].T, F, t, vmin=vmin, vmax=vmax, cmap=cc.cm.rainbow)
plt.colorbar()
plt.savefig('lsp_T_%03d.png' % i)

because of a bug in flux.plot.tripcolor_vector (https://github.com/sampotter/python-flux/issues/11).

Made the movie using:

ffmpeg -i lsp_T_%03d.png -c:v huffyuv -pix_fmt rgb24 lsp_T.avi

sampotter commented 3 years ago

Adding videos for the groundtruth temperature as well as the error:

lsp_T_gt.avi.zip lsp_T_error.avi.zip

Still from the gt video:

lsp_T_gt_052

And from the error video:

lsp_T_error_052

steo85it commented 3 years ago

Ok, you were much faster. :+1: I have similar plots, showing indeed that the same happens not only in dark areas but all over the place.

path = os.path.join('.', 'lunar_south_pole_80mpp_curvature.grd')

i0, i1 = 1350, 1410
j0, j1 = 920, 1010

compressed FF: (un-compressed - compressed)FF:

I had not tried to process the "un-compressed" FF before... it's reeeeeeally slower!!

sampotter commented 3 years ago

Thanks for including your plots, it's helpful.

My suspicion is that this is caused by the truncated SVDs. I have been a little wary of using these anyway, because individual components of the left or right singular vectors can be negative => could potentially get a negative flux value. So I'm digging into this a bit.

I want to look at one part of the compressed FF matrix that is compressed using a truncated SVD and see what the effect is. So I'm looking at FF[I[2], I[0]]@E[I[0]], where I[i] (0 <= i < 4) are the quadrants indices for the first level of the quadtree. This is the same shape model as the one I used in my videos above.

Here is the irradiance on quadrant I[0]:

Screen Shot 2021-01-23 at 5 53 11 PM

Here is the groundtruth FF[I[2], I[0]]@E[I[0]]:

Screen Shot 2021-01-23 at 5 53 43 PM

Here is the version computed using a truncated SVD with 13 terms (note the negative values):

Screen Shot 2021-01-23 at 5 54 08 PM

And here is with 26 terms, just to check if maybe we're using way too few terms in our SVD (doesn't seem to be a great improvement, plus still negative values :-\ ):

Screen Shot 2021-01-23 at 5 56 30 PM

Using a nonnegative matrix factorization instead

My suspicion has been that using a nonnegative matrix factorization will be much more appropriate for compressing the FF matrix. Here is one computed with 13 terms using sklearn.decomposition.NMF (it is about an order of magnitude slower than svds, but seems like a good starting point for experimenting). NMF generally does a better job of splitting a nonnegative signal into spatially continuous "features": you can see that it manages to extract a number of important ones.

Screen Shot 2021-01-23 at 6 00 32 PM

A simple thing we can do is to try to correct the error by computing the residual for some tolerances. It should also be sparse. Here's what we get by truncating entries that are smaller than 0.1% the max value of the residual. (So, approximate FF as U*W' + R, where U*W' is the NMF and R is a sparse residual matrix.)

Screen Shot 2021-01-23 at 6 04 55 PM

Let's check the pointwise error:

Screen Shot 2021-01-23 at 6 52 40 PM

OK, this seems good. But the size of this matrix is pretty big. Comparing the sizes of the new compressed format and the block from the sparse FF matrix:

In [355]: W.data.nbytes + W.indices.nbytes + W.indptr.nbytes + H.data.nbytes + H.indices.nbytes + H.indptr.nbytes + res_trunc.data.nbyt
     ...: es + res_trunc.indices.nbytes + res_trunc.indptr.nbytes
![Screen Shot 2021-01-23 at 6 53 12 PM](https://user-images.githubusercontent.com/6035363/105617432-4054f600-5dac-11eb-9155-525274cf0c87.png)

Out[355]: 4603848

In [345]: block_gt.data.nbytes + block_gt.indices.nbytes + block_gt.indptr.nbytes
Out[345]: 4156412

OK, so it's larger than the sparse matrix. But now we actually have a meaningful tolerance parameter that the user could play with (magnitude of components in residual below which we should truncate). Also, for a fixed tolerance parameter, we can search over NMFs of different sizes to find the smallest.

I played around with a few sizes, and found that 13 was probably too many. I reduced it to 5, which gave:

In [378]: W.data.nbytes + W.indices.nbytes + W.indptr.nbytes + H.data.nbytes + H.indices.nbytes + H.indptr.nbytes + res_trunc.data.nbyt
     ...: es + res_trunc.indices.nbytes + res_trunc.indptr.nbytes
Out[378]: 3054860

The resulting pointwise error is a little larger, but still not too shabby:

Screen Shot 2021-01-23 at 6 53 27 PM

I think the way forward is going to have to be using this approach instead for the reasons I outlined above. I'm not going to ditch the SVD block (it's important for other kinds of integral equations I eventually want to be able to solve using this library), but I think NMF + sparse is more appropriate for this.

Interested to see what you think about all of this, if you have some thoughts. Incidentally, I chatted with Erwan a bit about doing non-Lambertian reflectance (i.e. use some BRDF), and this NMF stuff generalizes quite nicely to nonnegative tensor factorizations, which I think we could use to compress the 3D tensor analog of this form factor matrix. Not 100% sure how useful this would be, but it could be an interesting thing to try.

sampotter commented 3 years ago

Started to work on putting together a NMF implementation following this paper:

"Fast Local Algorithms for Large Scale Nonnegative Matrix and Tensor Factorizations" by Cichocki and Phan

Implementation is in flux/nmf.py, and some tests are in the notebook examples/nmf_ideas.ipynb.

steo85it commented 3 years ago

Wow, this really seems to help! Now, I'm not sure whether SVD decomposition (or NMF in the new implementation) is applied to all blocks of the FF or only to some of them, but that could also determine if it's an issue to use 1. more terms in the SVD (already increasing from 13 to 26 seemed to go in the right direction); 2. if NMF being 10x slower is really an issue. Does this just apply to the compressed FF assembly or also to operations at each time step? Btw, what truncation of SVD is "as slow as" NMF? Does adding so many terms result in a reasonable approximation w.r.t. groundtruth or NMF? For sure NMF seem to be more "adapted" considering that one "intrinsically" avoids issues with negative terms... That's great for implementation and test, thanks! Curious to test them tomorrow and see what's the impact on the solution I attached above.

sampotter commented 3 years ago

Good questions, all.

I think I got a bit ahead of myself when I was working on this. I'm not sure what I was talking about actually addresses the bug brought up in this issue. ;-) (Although it might.)

I'm going to split this conversation off into a new issue (see link above).

I have an idea what might be causing the bug for this issue. Will update after I have a chance to check.

sampotter commented 3 years ago

OK, made some changes which will hopefully address this to some extent... They are pretty big changes, though. I figured out the way I was using the tol parameter in CompressedFormFactorMatrix didn't make any sense. I'm trying to be more deliberate in the way tolerance is used. The goal will be to set tol to ensure either the relative 2-norm error or max-norm error be bounded by tol, e.g. ||FF_compressed@E - FF@E||/||FF@E|| <= tol, where ||x|| is the 2-norm or max-norm. Not sure which norm will made more sense in the end. Max-norm is preferable, but more demanding.

I made one step in this direction in my latest commits, in particular f6ee25edd09e868bc45a012014886f3ceccffb57. If you have a chance, let me know how this works... I tested with the same region that we've been messing around with in this issue. Ideally, would be good to try out things like tol = 1e-1, 1e-2, 1e-3, ... and see what the effect on the "cross artifact" is and also on the structure of the compressed form factor matrix.

sampotter commented 3 years ago

Paging @nschorgh (see my previous comment about an update to try to start dealing with the "cross artifact"... behavior of assembly will be pretty different from what was there before)

nschorgh commented 3 years ago

@sampotter I was going to look into this, but the updates broke generate_FF_from_grd.py, even with tol=1e-3.

sampotter commented 3 years ago

That's fine---can you tell me how it broke? I will debug

nschorgh commented 3 years ago

My mistake. I forgot to do pip install after the most recent pull.

This is what the temperature looks like with tol=1e-3 (the bottom temperature is zero, so this is huge error).

And the outcome is almost the same for tol = np.finfo(np.float32).resolution, so this doesn't work.

sampotter commented 3 years ago

OK, will try to do debug this. Would you mind including a brief set of steps so I can recreate this plot?

nschorgh commented 3 years ago

Just run python generate_FF_from_grd.py which reads ingersoll81.grd. It will take about 2 minutes to finish. And then python equilbrT_from_FF.py which reads FF.bin I changed the tolerance manually.

sampotter commented 3 years ago

Idea for debugging:

Find a point which has a large error which is obviously wrong, let its index by i.
Let FF_gt be the correct form factor matrix, and let FF be compressed version.
Let x be the matrix with all zeros except x[i] == 1.
Plot the difference FF_gt@x - FF@x.
Could also overlay the "visibility mask" for triangle i (V[i, j] == 1 if triangles i and j can see each other, 0 otherwise)

Optionally: also plot the blocks of the form factor matrix visible from triangle i using CompressedFormFactorMatrix.get_row_blocks

steo85it commented 3 years ago

So, I investigated a bit following our discussion and using some useful routines (i.e., examples.viz_visibility.py) and the lsp_run.py example. I took a portion of some lunar south pole mesh where I see some "characteristic" issues (e.g., bottom right) between the reflected flux from the full (left) and compressed (right) FF. (I selected i0, i1 = 2475, 2525 # instead of 2425, 2525 \\ j0, j1 = 1912, 1962 # instead of 1912, 2012 from the LDEM_80S_150M_adjusted.grd in lsp_make_shape_model.py).

So, I started by comparing the visibility maps for element 500 (between compressed and full), and noticed some behaviors that I don't understand or which look weird to me. (in most "slides", the bottom visibility map is the case I'm describing, while the above one is a "reference/best" case ... not super clear, I apologize) First, the min_size parameter (defining the depth of the quad_tree) seems quite arbitrary (=16632), and it has a decent impact on the result: probably this should be carefully set for each application, depending on the resolution and roughness (?).

Also, within examples.viz_visibility.py, there is a threshold parameter to retrieve the 1/0s of the visibility mask from the FF. These seem to have a large effect even if no svd terms are present in the compressed FF: why is that so? I wouldn't expect the visibility to be compressed in a purely sparse (or even less, dense) FF. Probably I'm missing something here.

Then, as experiment, I tried to force the compressed FF to only have dense blocks. To my surprise, the result is even worse than in the "automatic" case. Is this expected? Is there some approx made to the visibility when generating spmat?

I also noticed that (at least in this specific case and shape_model) no SVD block was actually produced (linalg.get_rank always returned None, resulting in only sparse, dense or 0 blocks). So, I forced non-diag blocks to be sparse with k=40 (I also tested 10 or 80, and it behaves as expected). The results are worse than the ones with only sparse+dense (which is expected, I think), but the FF gets smaller (which is also expected). Also, I'm not sure how these "sparse-only" results came to be: I got from the code that svd is computed for non-diag sparse blocks, then tested against the sparse block for byte-size and "quality": a "better svd" (larger k) is then computed until reaching the size of the sparse block. If this size is reached, the sparse is set-up instead. Maybe this is just an unlucky case, but I'm not sure I understand how this "quality check" is set-up and related to the "tol" value.

So, at least based on these experiments (which again, just focus on the visibility from a single element in a specific shape_model) the issue doesn't seem to be specifically in the SVD blocks, but already present in a compressed FF where all blocks are sparse+dense (which, I thought, would have been close to equivalent to the full FF). Did I miss something or is there something strange, here?

nschorgh commented 3 years ago

I propose we forget the blocks (and the compressed FF) altogether at this stage, focus on demonstrating that the low-rank SVD approach works well for our problem, and write a paper about it. After that we can worry about the size of FF, e.g., using progressive mesh refinements, which would prevent FF from getting excessively large to begin with.

steo85it commented 3 years ago

I was indeed thinking about testing your svd routines on this specific case, to see how they perform (if they perform better, it would be a sign that the issue is not in the svd step/approx but rather somewhere "before that" in the algo). I'll do it after (a late) dinner. Also, I have already been applying the low-rank SVD from your routines to experiments I'm running at several Moon and Mercury craters for thermal modeling: I often end up getting negative fluxes and temperatures, depending on the truncation I apply - and not necessarily improving when using more terms, which looks "dangerous". So, it would still be good to figure out this blocks-issue, I think (not necessarily for the paper at hand, but at least for other planetary applications).

nschorgh commented 3 years ago

Interesting.

A postdoc at X told me that the randomization can introduce quite a large error, so I looked for a deterministic truncated SVD. Iterative (non-random) SVD algorithms that are O(N^2 k) definitely exist (as compared to N^2*log(k) for the randomized approach and N^3 for the full decomposition), but I could not find an implementation.

steo85it commented 3 years ago

I see ... I'm also not aware of all implementations, but maybe we could code them, based on some published algorithm? If the approx one is an issue, that is (at least in this case, it seems to work as well as the full svd).

Directly applying the randomized svd to the same topography gives mixed results: no "block artifact" in the "low-flux" areas, but some negative values. It's interesting to see that the quadtree (left is full-quadtree) behaves best for some blocks, but svd (right is full-svd) is generally closer to the full FF results.

Also (hoping I did everything correctly), the visibility map doesn't look any better than the one from the quadtree Also, I didn't notice any significant differences between the results with the full (scipy) and the approx (sklearn) version of svd, at least in this case (nor for Qrefl, nor for visibility maps) and besides being (as expected) slower.

It makes sense to me to say that the quadtree approach should work best, considering that it applies svd "compression" just to the relations where it "makes sense" to do so. On the other hand, the svd approx doesn't seem to be responsible for these artifacts we have.

nschorgh commented 3 years ago

The singular values should all be positive or zero, even in the truncated version. Are there any negative singular values? Somehow I'm thinking that positive singular values should guarantee that the fluxes are never negative, but I'm not sure why I'm thinking that and I could be wrong. Anyway, it's great to see that at least the absolute errors are small for the truncated approximate SVD, and the temperature gradients are well-behaved too. And it's also great to see that the radomization does not introduce any apparent inaccuracies.

Visibility is calculated by Embree, which has its own fast method, even before FF is calculated. Are you saying there are problems with the (full) visibility matrix or are you not saying that? If the visibility is not symmetric, the view factor matrix would no longer be positive definite, and some of the singular values could become negative.

sampotter commented 3 years ago

@nschorgh Just a quick reply to your comment about what the postdoc told you about randomization:

1) The randomized SVD you are using should be absolutely fine in terms of accuracy. I would not worry about it.

2) A better choice of SVD is ARPACK, which can be called from Python using scipy.sparse.linalg.svds. This will compute a k-term SVD in O(k N^2) time---or better, if you have a function which will multiply with the matrix you want to compress in less than O(N^2) time. It works with sparse matrices, so if you sparse matrix only have M nonzero entries, it will run in O(k M) time.

Basically there is a 2x2 grid of options: ["randomized SVD" or "ARPACK SVD"] and ["recursive blocks" or "compress the whole matrix"]. The latter option is what we need to focus on now.

Norbert, if you think it is reasonable to just get the "compress the whole matrix" approach working for now, and that this is enough material for a paper, I have no problem with just pausing and finishing writing at this point.

steo85it commented 3 years ago

@nschorgh I don't have negative sigmas in any of the tests I performed. Still, the color scale suggests that some isolated Qrefl was negative with 200 terms. And no, I'm NOT suggesting that the full visibility matrix has issues (I'm rather using it as reference, so I wouldn't know of any issues anyway), just saying that the quadtree version shows differences even when containing only dense and/or sparse blocks, which I wouldn't have expected (I thought issues came from the svd blocks).

I have to say that when increasing the randomized svd to 2000 terms (instead of 200 as I tried yesterday, center plot), computation time is still great and residual differences (right plot) from the full FF (left plot) are significantly reduced (using the same color scales as above). also, visibility errors get to the same level as with the quadtree+svd approach. Note: to adapt it to svd, I converted the visibility f=FF@e to

tmp1 = Vt @ e
tmp2 = np.multiply( sigma, tmp1)
f = U @ tmp2

Still, this assumes that one can compute the full FF, which is NOT the case for many of the applications we are considering/working on with Erwan. I clearly don't want to delay efforts for finalizing a paper for the sake of perfectionism (usually not a great idea), but I still think this specific algorithm would need to be perfected (e.g. by computing svds of smaller FF portions on the fly and combining them) to be applicable to higher resolution meshes we are interested in. @sampotter any thoughts about why the dense (or even sparse) FF would show such differences (or are they anyway expected because of the quadtree partition)?

steo85it commented 3 years ago

Ok, after @nschorgh comment, I checked how does the full matrix FF@e=f (with e=1 for a selected element, 0 elsewhere) compare to visibility computed directly from the shape_model, and I get virtually the same result as for the compressed (mostly sparse, 4 levels, no svd blocks) FF: Here it's "easy" to select as visible all values with f[f>0] = 1, on the other hand that's not the case for the svd version, where we get a lot of visibilities f<0 (~25% of values, here). I tried to both take abs(f) before applying the threshold, and setting all f[f<0]=0, but neither works. As shown above, a threshold of 5.e-7 works best in this case, but that's quite arbitrary. Also, what would a negative FF@e mean? Maybe I was simply naif in using the same formulation to compute visibility...

So, what does this mean? Clearly the quadtree partition involving only sparse, empty, and zero blocks seems to be ok (which is what I would have expected: removing zeros shouldn't result in information loss). Computing visibility from the FF results in some loss, but that's possibly expected (some numerical precision loss that we could tune?)? Also, maybe this is a lucky element (this is quite a random element, but I have a couple with "empty visibility" that instead are assigned some from the FF, resulting in "large errors" in Qrefl), since the Qrefl map still looks worse (in terms of artifacts in low Q areas) for the compressed FF than for the full FF... bref, I'm confused (and maybe what I'm sharing from my "personal quest" is already known/clear to everyone except to me - sorry if that's the case)! ^^

steo85it commented 3 years ago

Finally, it surprises me a bit that, when using the same settings, the octree partition works much better than the quadtree for this problem (see both the residual vs fullFF - 83MB - and the visibility matrix). The quadtree starts messing up as soon as an additional level of quadtree is added (by changing the min_size parameter), while the octree seems to scale up/down better. Not sure what this means, just putting it on the table. Also, for the svds here, I included in the code the randomized_svd, and it works fine (besides the usual negative values...).

sampotter commented 3 years ago

@steo85it, a response to your first recently reply. My wife now wants to actually enjoy the weekend so I have had my work session terminated :-)

First, so it doesn't get lost in the text below ;-) If it isn't too much trouble, please commit your scripts to GitHub so I can try them out.

First, the min_size parameter (defining the depth of the quad_tree) seems quite arbitrary (=16632), and it has a decent impact on the result: probably this should be carefully set for each application, depending on the resolution and roughness (?).

The min_size parameter is arbitrary. The motivation behind this parameter:

In the compressed matrix algorithm, we have a free parameter which needs to be tuned: how deep to expand the quadtree. In a perfect world, the accuracy of the MVP should be the same for all choices of this parameter.
Hence, varying this parameter changes how fast the MVP is. For a quadtree with 1 level, we will just have a dense matrix (no compression => slow == O(N^2)). For a quadtree with max levels, each leaf of the quadtree will contain 1 triangle (deep and complicated tree => slow == technically O(N log N) but big constant).
For some number of elements, it will be faster to not descend further in the tree, either storing a dense, sparse, or SVD block. This is mainly because dense linear algebra is very fast, and at some point traversing a tree in Python will waste a lot of time.

I chose to just introduce this min_size parameter: if a leaf has fewer than min_size elements, don't split the leaf node. This is pretty crude. However, using this parameter, you could construct compressed FF matrices for several different choices of min_size and choose the best one.

Also, within examples.viz_visibility.py, there is a threshold parameter to retrieve the 1/0s of the visibility mask from the FF. These seem to have a large effect even if no svd terms are present in the compressed FF: why is that so? I wouldn't expect the visibility to be compressed in a purely sparse (or even less, dense) FF. Probably I'm missing something here.

I am not sure I follow your example correctly, but I will try to explain that examples/viz_visability.py example.

The comment on lines 25-26 ("The form factor matrix only approximately stores the visibility information.") is referring only to the approximation that can occur because of the SVD truncation. If a FF block is stored as a dense or sparse matrix, the visibility mask should be able to be recovered exactly by checking which elements are nonzero.
However, my example is a little confused (sorry). There is no reason to use that eps_f parameter. In line 30, it is best to just check if abs(f) > 0 (if an entry is nonzero, the compressed FF matrix says that there is visibility, whether that matches the groundtruth or not).
There is also the eps parameter (line 40). This is completely different. This is a parameter used to push the start of the ray away from the center of the triangle it starts from so that it doesn't actually hit it. I'm not sure if this is necessary anymore. It should be possible to set the tnear parameter on line 159 of flux/shape.py, but I was not able to get this to work satisfyingly. You can try setting eps = 0 and see if it affects the computed groundtruth visibility mask very much (i.e. vis on line 41).

I am not sure whether you are tweaking eps_f or eps in your tests.

Then, as experiment, I tried to force the compressed FF to only have dense blocks. To my surprise, the result is even worse than in the "automatic" case. Is this expected? Is there some approx made to the visibility when generating spmat?

Thanks for finding this. You have absolutely found a bug and your test should be included as a unit test to fix.

Here is the property to validate: If the leaf nodes of a compressed form factor matrix are only DenseBlocks or SparseBlocks, then it should agree (up to floating point round-off) with the original uncompressed FF matrix.

This should be true for the visibility mask as well as the form factor matrix values themselves. So we should test both things (it looks you only checked whether the visibility masks matched).

Also, I'm not sure how these "sparse-only" results came to be: I got from the code that svd is computed for non-diag sparse blocks, then tested against the sparse block for byte-size and "quality": a "better svd" (larger k) is then computed until reaching the size of the sparse block. If this size is reached, the sparse is set-up instead. Maybe this is just an unlucky case, but I'm not sure I understand how this "quality check" is set-up and related to the "tol" value.

This is basically right. The only extra thing is that the tol value is used to check the accuracy of the MVP by looking at the singular values, sigma1 >= sigma2 >= ... >= sigmaK. The test (in flux.linalg.estimate_rank) is sqrt(N)*S[0]*S <= tol.

Here is another thing that can be tested (and used to generate data for the last test I suggested): if tol == 0, all blocks should be dense or sparse.

So, at least based on these experiments (which again, just focus on the visibility from a single element in a specific shape_model) the issue doesn't seem to be specifically in the SVD blocks, but already present in a compressed FF where all blocks are sparse+dense (which, I thought, would have been close to equivalent to the full FF). Did I miss something or is there something strange, here?

Your understanding is correct and you have identified a bug.

steo85it commented 3 years ago

Hi Sam, thanks for the exhaustive answer and sorry for perturbing your weekend: that was definitely not my goal! :)

First, here is my "visibility tests" version : it's definitely not an "operative push", since it's full of debug print-outs and potentially dangerous tests, but it should work by running flux.lsp_run.py and then checking the results with examples.viz_visibility.py (btw, thanks for pushing those viz_ routines, they were super useful!). If you plan to use it, maybe just check it out locally as a stand-alone temporary version.

Other than that, all your answers are clear and I understand them, thanks. Just a couple of (maybe useful) points:

I replaced the eps parameter (line 40 of viz_visibility.py) by an automatic displacement (in both shape.check_vis_1_to_N and shape.get_direct_irradiance) based on the area of the facet. That's because in some recent experiments, we have noticed with Erwan that a fix eps value can introduce (very prominent) artifacts in the flux (and T) results. If the user wants, it's still possible to impose a value through the eps parameter.

Here is the property to validate: If the leaf nodes of a compressed form factor matrix are only DenseBlocks or SparseBlocks, then it should agree (up to floating point round-off) with the original uncompressed FF matrix.
This is very useful to know! So, at some point I actually managed to validate this using only dense and sparse blocks, but only when imposing a single quadtree level (so, depth=1, or 16 blocks - I wouldn't know how to set-up a depth=0 FF, other than sticking to the fullFF). This was great to have a "ground truth" with all the compressedFF attributes to compare to. I could also introduce svd blocks w/o significant losses, and with a 50% reduction in size. However, as soon as I add a second level (even of only dense blocks) then errors start to sneak in, in both flux and visibility. As mentioned the octree seems less affected by this, I don't know why. p.s.: I pushed to the same branch also a draft test for this (examples.leafs_test.py)... please correct the cFF-fFF loop, I'm not sure how to retrieve FF elements.

Here is another thing that can be tested (and used to generate data for the last test I suggested): if tol == 0, all blocks should be dense or sparse.

This works, and it's one way I used to enforce only dense/sparse blocks in my test FFs (on top of playing around with the min_size parameter and with the 'ifs' in compressed_form_factors.FormFactorBlockMatrix.make_block() ) .

sampotter commented 2 years ago

@steo85it @nschorgh

Okay, I think I've finally fixed this bug.

My current changes are in the branch debug-compressed-form-factor.

The important fix is in commit ead14360c1be522bfdafaaf58a99308b195d3fc8. The gist of this fix is the following:

When I build the compressed form factor matrix, when I "come back up" the tree, if I encounter an internal tree node (e.g. quadtree node or octree node), if its children were either all dense or all sparse, I would concatenate them together into a single dense or sparse node, replacing the internal tree node with a leaf node of that type.
The bug was that I was failing to undo the quadtree/octree permutation of each of these leaf nodes before concatenating them, essentially forgetting these permutations and leaving the entries of these matrices in the wrong order. Obviously the matrix-vector multiplication will be wrong at this point.
The fix was very easy to implement without getting rid of this optimization. Since I'm already carrying around the correct form factor matrix (as the variable spmat), when I detect this case, I just store this node directly and delete the tree node.

To test this, I ran a large example for the Haworth crater and computed the steady state temperature. Here are the results:

lunar_south_pole

haworth_E

haworth_T

haworth_T_shadow

haworth_blocks

Recall that in the block matrix plot, red = sparse matrix node, cyan = SVD node.

For this test, I used distmesh to make a triangle mesh with about 70K triangles, and set tol=1e-3 and min_size=512. The settings I used are what is currently in the file examples-new/lunar_south_pole/haworth.py pushed to GitHub.

So, there are no more blocky artifacts... I believe this should fix it.

If either of you have time, it would be helpful to re-run some of the previous tests to confirm, then I'll make a PR for my changes.

Next I'm going to start working on some more extensive numerical tests for our paper and get writing.

Edit: by the way, one thing I did while fixing this bug (changes in my branch) is to add CGAL as another choice of raytracer: TrimeshShapeModel now has two subclasses, CgalTrimeshShapeModel and EmbreeTrimeshShapeModel. I did these tests using CGAL because in the process of debugging I found that Embree had some weird errors that were quite bad and hard to explain. That said, I don't think Embree is responsbile for this bug, since usually the problem with Embree would be along the lines of: "a ray shoots through a triangle it should definitely hit, for no discernible reason". If you think about it, if there is a single missing ray hit from a smooth surface with many adjacent triangles which are all hit, the result should just be a relatively small error in the computed flux, distributed over a large area... that is, not really like the areas we're observing here, which clearly reflect the block structure of the compressed form factor matrix.

steo85it commented 2 years ago

As mentioned elsewhere: thanks for debugging this, the results look veeeeeeery promising! Yuppie!

If either of you have time, it would be helpful to re-run some of the previous tests to confirm, then I'll make a PR for my changes.

And yes, I'm rerunning them, both with CGAL and Embree to check if we can safely use the "fast option" when cross-checked against the "safe one".

sampotter commented 2 years ago

Fixed in PR https://github.com/sampotter/python-flux/pull/42

sampotter / fluxpy

weird artifact in T results at LSP #6

Using a nonnegative matrix factorization instead