Open svniemeijer opened 7 years ago
We should create this in such a way such that the core function is a C library function that returns a HARP product. We can then also introduce a harp.diff
python function that returns the difference of two products/datasets (which uses the same underlying code).
See also wikipedia. It is probably better to use _diffabsrelavg
: 2|x-y|/(|x|+|y|).
And we might also want to distinguish absolute/signed differences vs. absolute/signed scaling for relative differences. For instance, we might want to use absolute scaling for a signed difference: 2(x-y)/(|x|+|y|)
This tool should create a new dataset that contains the differences between two data input dataset. The input datasets need to both have a collocation_index variable (which is then used to match up pairs) or need to already be temporally aligned.
There are different ways to calculate differences and we should think about supporting the following (using 'x' and 'y' as names for the datasets, and using 'x-y' as the baseline difference) types. These are the postfixes that should be added to the variablenames:
_diff
: x-y_diffrelx
: (x-y)/x_diffrely
: (x-y)/y_diffrelmin
: (x-y)/min(x,y)_diffrelmax
: (x-y)/max(x,y)_diffrelavg
: 2(x-y)/(x+y)_diffabs
: |x-y|_diffabsrelx
: |x-y|/|x|_diffabsrely
: |x-y|/|y|_diffabsrelmin
: |x-y|/min(|x|,|y|)_diffabsrelmax
: |x-y|/max(|x|,|y|)_diffabsrelavg
: 2|x-y|/|x+y|Calculating differences will only be support for variables that have a unit attribute (which may be empty for unitless quantities; but should not be omitted).
Also make sure to add a 'point distance' difference if both datasets have (center) lat/lon values. How do we name the lat/lon point distance?
Do we wan’t surface overlap fraction, area_distance, area_overlap_fraction/area_intersection_fraction?
For all types of differences we should also add uncertainty propagation:
uncertainty of (a-b)
will besqrt(uncert(a)^2) + uncert(b)^2)
.uncertainty of (a-b)/a
=uncertainty of 1 - b/a
=uncertainty of b/a
=|b/a| sqrt( ('uncertainty of a'/a)^2 + ('uncertainty of b'/b)^2 )
uncertainty of (a-b)/b
=uncertainty of a/b - 1
=uncertainty of a/b
=|a/b| sqrt( ('uncertainty of a'/a)^2 + ('uncertainty of b'/b)^2 )
uncertainty of 2(a-b)/(a+b)
=2 * uncertainty of (1/(1+b/a) - 1/(a/b+1))
uncertainty of 1/(1+a/b)
=ab/(a+b)^2 sqrt( ('uncertainty of a'/a)^2 + ('uncertainty of b'/b)^2 )
=1/(a+b)^2) sqrt( ('uncertainty of a' * b)^2 + ('uncertainty of b' * a)^2 )
also,uncertainty of 1/(1+a/b)
=uncertainty of 1/(1+b/a)
by assuminguncertainty of 1/(1+b/a)
anduncertainty of 1/(1+a/b)
to be fully correlated we can just add the uncertainties (instead of taking the 2-norm):uncertainty of 2(a-b)/(a+b)
=2 * ( 2 * uncertainty of 1/(1+a/b) )
=(2/(a+b))^2 sqrt( ('uncertainty of a' * b)^2 + ('uncertainty of b' * a)^2 )
Some quantities may require special treatment for the calculation of the difference:
We may also want to add differences of intervals (in terms of intersection length):
_intersect
: intersect(x,y) = intersection of x_bounds and y_bounds (can also be area intersection when using lat/lon bounds of x and y)_intersectrelx
: intersect(x,y)/length(x)_intersectrely
: intersect(x,y)/length(y)_intersectrelmin
: intersect(x,y)/min(length(x),length(y))_intersectrelmax
: intersect(x,y)/max(length(x),length(y))_intersectrelavg
: 2intersect(x,y)/(length(x)+length(y))_intersectrelunion
: intersect(x,y)/(length(x)+length(y)-intersect(x,y)) = relative to union