Closed AlexLipp closed 1 year ago
Some investigation yields some non-promising results:
(a * cp.inv_pos(b)).is_dcp() = False
(a * cp.inv_pos(b)).is_dqcp() = False
(a/b).is_dcp() = False
(a/b).is_dqcp() = True
(cp.maximum(a/b,b/a)).is_dcp() = False
(cp.maximum(a/b,b/a)).is_dqcp() = True
(cp.maximum(a/b,b/a)+cp.maximum(c/d,d/c)).is_dcp() = False
(cp.maximum(a/b,b/a)+cp.maximum(c/d,d/c)).is_dqcp() = False
(cp.maximum(a/b,b/a)+cp.maximum(c/d,d/c)).is_dgp() = True
cp.norm(cp.vstack([cp.maximum(a/b,b/a),cp.maximum(c/d,d/c)])).is_dgp() = True
So it's possible we can make this into a geometric program, though cvxpy doesn't suggest that for our full program, so that could require some investigation.
I don't think we can make a geometric problem here. We have:
normalised_concentration = my_data.total_flux / my_data.total_upstream_area
self._primary_terms.append(cp_log_ratio_norm(normalised_concentration, observed))
but at some point my_data.total_flux
is the sum of two upstream nodes. In this case we have cp_log_ratio_norm
giving:
max(a / (b + c), (b + c) / a
The short test program:
import cvxpy as cp
a = cp.Variable(pos=True)
b = cp.Variable(pos=True)
c = cp.Variable(pos=True)
print("(a / (b + c)).is_dgp() = ", (a / (b + c)).is_dgp())
print("((b + c) / a).is_dgp() = ", ((b + c) / a).is_dgp())
print("cp.maximum(a / (b + c), (b + c) / a).is_dgp() = ", cp.maximum(a / (b + c), (b + c) / a).is_dgp())
gives
(a / (b + c)).is_dgp() = True
((b + c) / a).is_dgp() = True
cp.maximum(a / (b + c), (b + c) / a).is_dgp() = False
So the maximum is a problem here.
I have some more ideas here, but I'll need to play around with them on paper.
I have found that this doesn't affect the results, but in
a = cp.Variable(pos=True) b = cp.Variable(pos=True) c = cp.Variable(pos=True)
I think a
should be a cp.Parameter
not Variable
as it is always an observation not a parameter.
One approach (if we cannot find a DCP or DGP way to implement roughness) is to regularize the relative deviations of each prediction from the mean observation, as opposed to the difference between adjacent catchments. This would potentially result in 'rougher' solutions, but as we are already discretising the problem using drainage basins (which are in general irregularly shaped) this may not be too undesirable. I think the relative not absolute differences is probably more important, inspecting the current outputs.
As obs_mean
would be a cp.Parameter
, we can use the (DCP compliant) cp_log_ratio
function as previously.
The following test program returns True
import cvxpy as cp
a = cp.Parameter(pos=True) # e.g., obs_mean
b = cp.Parameter(pos=True) # e.g., an observation
c = cp.Variable(pos=True) # e.g., a subcatchment concentration
d = cp.Variable(pos=True) # e.g., a subcatchment concentration
cp.norm(
cp.vstack(
[
cp.maximum(a * cp.inv_pos(c), c * cp.inv_pos(a)), # Regularizer term
cp.maximum(b * cp.inv_pos(c+d), (c+d) * cp.inv_pos(b)), # Misfit term
]
)
).is_dcp()
@AlexLipp : Do you mean for a
to be the mean of the observations of two adjacent catchments? (If so, have you considered whether an arithmetic, harmonic, or geometric mean is best?)
@r-barnes Actually, I mean a
to be the global mean of all observations, not just the two adjacent catchments. In that sense we are not calculating 'roughness' anymore but just model size.
But, you raise a good point about means. As relative differences are the name of the game we would want the geometric mean for this.
I note that currently when using obs_mean
to normalise the observations we use the arithmetic mean. As this is just a way to scale the data to prevent numerical issues, I suspect it doesn't matter very much, but perhaps if we do start using obs_mean
for regularisation too we should update it to the geometric mean for consistency.
I thought taking the maximum of all the regularization terms would be nice, but then we're adding cp.maximum()+cp.maximum()
where the first is one of many primary terms and the second is the regularizer. That breaks the QCP guarantees :-(
The regularizer terms cp.maximum(a/b,b/a)
are all DGP, but the primary terms cp.maximum(a/(b+c+d), (b+c+d)/a)
are not.
I've come up with a solution in #27. Instead of penalising roughness it penalises relative deviations from the mean observation. This permits rougher solutions but they are much better conditioned in general, which I believe is preferable.
Thinking about situations like sewage networks, this might be the most general approach to regularize the solution as we wouldn't necessarily expect (for example) geographically adjacent sewage outlets to be 'smoothly' linked.
Unfortunately, this mode of regularization cannot be applied to the 'continuous' method I'm developing (#24) which is reliant on smoothness to regularize... so a convex way of calculating relative difference between variables would still be useful to discover.
Closed with #27
self._regularizer_terms.append(border_length * (a_concen - b_concen))
is sensitive to the absolute difference in concentrations between adjacent nodes. This is not ideal as it disproportionately punishes variability between high concentration nodes which might result in 'clipping' of log-normal peaks.Unfortunately, the DCP compliant relative difference term we use in the objective,
self._regularizer_terms.append(border_length * (cp_log_ratio_norm(a_concen,b_concen)))
does not appear to be DCP for the regularizer.