AlexLipp commented 2 years ago

self._regularizer_terms.append(border_length * (a_concen - b_concen)) is sensitive to the absolute difference in concentrations between adjacent nodes. This is not ideal as it disproportionately punishes variability between high concentration nodes which might result in 'clipping' of log-normal peaks.

Unfortunately, the DCP compliant relative difference term we use in the objective, self._regularizer_terms.append(border_length * (cp_log_ratio_norm(a_concen,b_concen))) does not appear to be DCP for the regularizer.

r-barnes commented 2 years ago

Some investigation yields some non-promising results:

(a * cp.inv_pos(b)).is_dcp() =  False
(a * cp.inv_pos(b)).is_dqcp() =  False
(a/b).is_dcp() =  False
(a/b).is_dqcp() =  True
(cp.maximum(a/b,b/a)).is_dcp() =  False
(cp.maximum(a/b,b/a)).is_dqcp() =  True
(cp.maximum(a/b,b/a)+cp.maximum(c/d,d/c)).is_dcp() =  False
(cp.maximum(a/b,b/a)+cp.maximum(c/d,d/c)).is_dqcp() =  False
(cp.maximum(a/b,b/a)+cp.maximum(c/d,d/c)).is_dgp() =  True
cp.norm(cp.vstack([cp.maximum(a/b,b/a),cp.maximum(c/d,d/c)])).is_dgp() =  True

So it's possible we can make this into a geometric program, though cvxpy doesn't suggest that for our full program, so that could require some investigation.

r-barnes commented 2 years ago

I don't think we can make a geometric problem here. We have:

normalised_concentration = my_data.total_flux / my_data.total_upstream_area
self._primary_terms.append(cp_log_ratio_norm(normalised_concentration, observed))

but at some point my_data.total_flux is the sum of two upstream nodes. In this case we have cp_log_ratio_norm giving:

max(a / (b + c), (b + c) / a

The short test program:

import cvxpy as cp

a = cp.Variable(pos=True)
b = cp.Variable(pos=True)
c = cp.Variable(pos=True)
print("(a / (b + c)).is_dgp() = ", (a / (b + c)).is_dgp())
print("((b + c) / a).is_dgp() = ", ((b + c) / a).is_dgp())
print("cp.maximum(a / (b + c), (b + c) / a).is_dgp() = ", cp.maximum(a / (b + c), (b + c) / a).is_dgp())

gives

(a / (b + c)).is_dgp() =  True
((b + c) / a).is_dgp() =  True
cp.maximum(a / (b + c), (b + c) / a).is_dgp() =  False

So the maximum is a problem here.

r-barnes commented 2 years ago

I have some more ideas here, but I'll need to play around with them on paper.

AlexLipp commented 2 years ago

I have found that this doesn't affect the results, but in

a = cp.Variable(pos=True) b = cp.Variable(pos=True) c = cp.Variable(pos=True)

I think a should be a cp.Parameter not Variable as it is always an observation not a parameter.

AlexLipp commented 2 years ago

One approach (if we cannot find a DCP or DGP way to implement roughness) is to regularize the relative deviations of each prediction from the mean observation, as opposed to the difference between adjacent catchments. This would potentially result in 'rougher' solutions, but as we are already discretising the problem using drainage basins (which are in general irregularly shaped) this may not be too undesirable. I think the relative not absolute differences is probably more important, inspecting the current outputs.

As obs_mean would be a cp.Parameter, we can use the (DCP compliant) cp_log_ratio function as previously.

The following test program returns True

import cvxpy as cp 

a = cp.Parameter(pos=True) # e.g., obs_mean
b = cp.Parameter(pos=True) # e.g., an observation
c = cp.Variable(pos=True) # e.g., a subcatchment concentration
d = cp.Variable(pos=True) # e.g., a subcatchment concentration

cp.norm(
    cp.vstack(
        [
            cp.maximum(a * cp.inv_pos(c), c * cp.inv_pos(a)), # Regularizer term
            cp.maximum(b * cp.inv_pos(c+d), (c+d) * cp.inv_pos(b)), # Misfit term
        ]
    )
).is_dcp()

r-barnes commented 2 years ago

@AlexLipp : Do you mean for a to be the mean of the observations of two adjacent catchments? (If so, have you considered whether an arithmetic, harmonic, or geometric mean is best?)

AlexLipp commented 2 years ago

@r-barnes Actually, I mean a to be the global mean of all observations, not just the two adjacent catchments. In that sense we are not calculating 'roughness' anymore but just model size.

But, you raise a good point about means. As relative differences are the name of the game we would want the geometric mean for this.

I note that currently when using obs_mean to normalise the observations we use the arithmetic mean. As this is just a way to scale the data to prevent numerical issues, I suspect it doesn't matter very much, but perhaps if we do start using obs_mean for regularisation too we should update it to the geometric mean for consistency.

r-barnes commented 2 years ago

26 experiments with a QCP, but I don't think it's a great idea:

It requires taking a maximum of all terms. This will encourage them all to move towards 1 (since that's the minimum ratio), but once the largest term can't be reduced any more the rest float free. The regularizer therefore won't have much impact most of the time.
The bisection is slow.

I thought taking the maximum of all the regularization terms would be nice, but then we're adding cp.maximum()+cp.maximum() where the first is one of many primary terms and the second is the regularizer. That breaks the QCP guarantees :-(

r-barnes commented 2 years ago

The regularizer terms cp.maximum(a/b,b/a) are all DGP, but the primary terms cp.maximum(a/(b+c+d), (b+c+d)/a) are not.

AlexLipp commented 1 year ago

I've come up with a solution in #27. Instead of penalising roughness it penalises relative deviations from the mean observation. This permits rougher solutions but they are much better conditioned in general, which I believe is preferable.

Thinking about situations like sewage networks, this might be the most general approach to regularize the solution as we wouldn't necessarily expect (for example) geographically adjacent sewage outlets to be 'smoothly' linked.

Unfortunately, this mode of regularization cannot be applied to the 'continuous' method I'm developing (#24) which is reliant on smoothness to regularize... so a convex way of calculating relative difference between variables would still be useful to discover.

AlexLipp commented 1 year ago

Closed with #27

r-barnes / faster-unmixer

Make regularizer sensitive to relative not absolute differences (and also DCP) #18

26 experiments with a QCP, but I don't think it's a great idea: