BGU app not 100% correct.

mcourteaux commented 1 year ago

I'm toying around with the Bilateral Guided Upsampling demo code, and it seems to be not correct all the time.

Specifically, according to my tests:

CPU, manual schedule: wrong
CPU, auto schedule: wrong
GPU, manual schedule: correct
GPU, auto schedule: wrong

All the wrong results are wrong in the exact same way. Using this random picture from a tree: https://www.gardeningknowhow.com/wp-content/uploads/2020/12/lonely-japanese-cherry.jpg

Running and saving all the different versions gets me this:

CUDA manual schedule: test manual_out

CUDA auto schedule: test auto_out

CPU manual schedule: test manual_out

CPU auto schedule: test auto_out

Q1: Notice the red-blue discoloration in the sky. Could this be a data-clipping issue? The colors seem like overflows or something. Here is another example of this problem in another picture of mine:

Q2: It's been a while since I was hands-on with Halide, but my experience always told me that, despite Halide promising that the result should be the same, regardless of the schedule (if you don't do the race_condition() things and alike). Still here is another example of where this promise seems broken. Is there a FAQ or guide on why this might happen and what to pay attention to regarding these inconsistencies?

Q3: Is there a way to reduce the block artifacts here? Looking at the one that worked (the first: CUDA manual schedule), there is still quite some block artifacts visible. Intuition tells me this is because of a lack of diffusion through the bilateral grid (which should ensure the smoothness if I understand the explanatory video correctly), but the blurring settings are fixed.

abadams commented 1 year ago

I suspect what's going on is that for some inputs, BGU is numerically unstable. @jiawen and I tried to regularize it correctly, but I suspect there's still something wrong, and it's trying to invert nearly-singular matrices in unpopulated regions of the grid.

You would expect this to be susceptible to floating-point instability issues, which is the main way in which Halide pipelines can produce different outputs with different schedules. (The other way is if you pass the same buffer for input and output - Halide assumes they're distinct).

mcourteaux commented 1 year ago

Just asking for clarity: do you expect this to be an implementation issue, or a mathematical issue with how the regularization is done in the paper you wrote?

mcourteaux commented 1 year ago

I'm looking at the paper, and at the code. I'm having some trouble doing the interpretation of how one maps to the other. The paper talks about 3x4 matrices, as transformations. I'll try to map the idea of the paper into stuff that I know (fitting Multivariate Gaussian distributions to samples, and then using the conditional distribution mean as a regression function).

I understand that you first need to accumulate statistics to be able to fit the linear regression to it, and you seem to be accumulating this as E[s], E[d], E[s*d^T], and E[s*s^T] with s "source" and d "destination". With the linear algebra I work with regularly, this is the part I understand. The gradient matrix which defines the slope between "source" to "destination", should be given by:

G_ds = R_ds * inv(R_ss)

G is here the result of the symmetric_solve as R_ds is the SPD covariance matrix. While the offset color, should be:

o = E[d] - G_ds * E[s]

Such that the mapping from source to destination color:

dst = E[d] + G_ds * (src - E[s])

Now, with this interpretation (which I hope is correct), looking at the regularization, which is there to stabilize inversion of the R_ss matrix. Regularizing this will bring the solve closer to the identity matrix, which is a linear slope (that sounds good to me). However, the paper states that alpha_44 (which is the count of constraints (ie: samples contributing to the fit?), coming from the artificially added fourth element of constant 1 of the color sample) is used to scale the regularization. This seems super-counterintuitive to me, as the more samples you have, the better the fit will be. Having very few samples, will yield a bogus fit, so you'd like to regularize that more. So, to me the logic of using alpha_44 (or N in the code) seems backwards.

Could I be onto something?

Just a thought: the zoomed in picture I sent of the out-of-focus tree branches, the problem seems to occur close to the edges of the branches. So the color range is not particularly narrow there. The z-axis of the bilateral grid might indeed be sparse.

mcourteaux commented 1 year ago

I tried this, and just heavily simplifying the regularization to a constant works just great.

diff --git a/apps/bgu/bgu_generator.cpp b/apps/bgu/bgu_generator.cpp
index 653e5ed87..d3feac9c8 100644
--- a/apps/bgu/bgu_generator.cpp
+++ b/apps/bgu/bgu_generator.cpp
@@ -397,20 +397,8 @@ public:

             // Regularize by pushing the solution towards the average gain
             // in this cell = (average output luma + eps) / (average input luma + eps).
-            const float lambda = 1e-6f;
-            const float epsilon = 1e-6f;
-
-            // The bottom right entry of A is a count of the number of
-            // constraints affecting this cell.
-            Expr N = A(3, 3);
-
-            // The last row of each matrix is the sum of input and output
-            // RGB values for the pixels affecting this cell. Instead of
-            // dividing them by N+1 to get averages, we'll multiply
-            // epsilon by N+1. This saves two divisions.
-            Expr output_luma = b(3, 0) + 2 * b(3, 1) + b(3, 2) + epsilon * (N + 1);
-            Expr input_luma = A(3, 0) + 2 * A(3, 1) + A(3, 2) + epsilon * (N + 1);
-            Expr gain = output_luma / input_luma;
+            const float lambda = 1e-1f;

             // Add lambda and lambda*gain to the diagonal of the
             // matrices. The matrices are sums/moments rather than
@@ -419,15 +407,15 @@ public:
             // to the diagonal of a covariance matrix. Otherwise it does
             // nothing in cells with lots of linearly-dependent
             // constraints.
-            Expr weighted_lambda = lambda * (N + 1);
+            Expr weighted_lambda = lambda;
             A(0, 0) += weighted_lambda;
             A(1, 1) += weighted_lambda;
             A(2, 2) += weighted_lambda;
             A(3, 3) += weighted_lambda;

-            b(0, 0) += weighted_lambda * gain;
-            b(1, 1) += weighted_lambda * gain;
-            b(2, 2) += weighted_lambda * gain;
+            b(0, 0) += weighted_lambda; // * gain;
+            b(1, 1) += weighted_lambda; // * gain;
+            b(2, 2) += weighted_lambda; // * gain;

             // Now solve Ax = b
             Matrix<3, 4> result = transpose(solve_symmetric(A, b, line, x, using_autoscheduler(), get_target()));

Choosing the lambda to be constant at 0.1 makes it behave like one tenth of a "identity linear" sample contributing to the regression fit, pulling the solution towards the identity function. This simplification also speeds up the pipeline a bit, as a few operations (especially the gain division is gone).

Result:

And the cherry tree: tree jpg manual_out

mcourteaux commented 1 year ago

The only thing I'm not sure about is that I changed it to regularize towards the identity function, instead of "towards the average gain in the cell". The problem I have with "the average gain" is that you also need samples to compute the average gain, which is exactly the problem being solved.

I'll open a PR with my cleaned up changes.

mcourteaux commented 1 year ago

The regularization that was in place seemed to be more designed to deal with lots of samples and still having a slope of infinity. I think for the matting task (but I thought the results were not satisfying either way?).

mcourteaux commented 1 year ago

I guess we can close this, unless we want to keep this open for @jiawen to respond?

jiawen commented 1 year ago

Sorry for the slow response. I'll try to find some cycles to look at this in the morning.

halide / Halide

BGU app not 100% correct. #7681