Create harpdiff tool - Githubissues

svniemeijer commented 7 years ago

This tool should create a new dataset that contains the differences between two data input dataset. The input datasets need to both have a collocation_index variable (which is then used to match up pairs) or need to already be temporally aligned.

There are different ways to calculate differences and we should think about supporting the following (using 'x' and 'y' as names for the datasets, and using 'x-y' as the baseline difference) types. These are the postfixes that should be added to the variablenames:

_diff: x-y
_diffrelx: (x-y)/x
_diffrely: (x-y)/y
_diffrelmin: (x-y)/min(x,y)
_diffrelmax: (x-y)/max(x,y)
_diffrelavg: 2(x-y)/(x+y)
_diffabs: |x-y|
_diffabsrelx: |x-y|/|x|
_diffabsrely: |x-y|/|y|
_diffabsrelmin: |x-y|/min(|x|,|y|)
_diffabsrelmax: |x-y|/max(|x|,|y|)
_diffabsrelavg: 2|x-y|/|x+y|

Calculating differences will only be support for variables that have a unit attribute (which may be empty for unitless quantities; but should not be omitted).

Also make sure to add a 'point distance' difference if both datasets have (center) lat/lon values. How do we name the lat/lon point distance?

Do we wan’t surface overlap fraction, area_distance, area_overlap_fraction/area_intersection_fraction?

For all types of differences we should also add uncertainty propagation:

the uncertainty of (a-b) will be sqrt(uncert(a)^2) + uncert(b)^2).
uncertainty of (a-b)/a = uncertainty of 1 - b/a = uncertainty of b/a = |b/a| sqrt( ('uncertainty of a'/a)^2 + ('uncertainty of b'/b)^2 )
uncertainty of (a-b)/b = uncertainty of a/b - 1 = uncertainty of a/b = |a/b| sqrt( ('uncertainty of a'/a)^2 + ('uncertainty of b'/b)^2 )
uncertainty of 2(a-b)/(a+b) = 2 * uncertainty of (1/(1+b/a) - 1/(a/b+1)) uncertainty of 1/(1+a/b) = ab/(a+b)^2 sqrt( ('uncertainty of a'/a)^2 + ('uncertainty of b'/b)^2 ) = 1/(a+b)^2) sqrt( ('uncertainty of a' * b)^2 + ('uncertainty of b' * a)^2 ) also, uncertainty of 1/(1+a/b) = uncertainty of 1/(1+b/a) by assuming uncertainty of 1/(1+b/a) and uncertainty of 1/(1+a/b) to be fully correlated we can just add the uncertainties (instead of taking the 2-norm): uncertainty of 2(a-b)/(a+b) = 2 * ( 2 * uncertainty of 1/(1+a/b) ) = (2/(a+b))^2 sqrt( ('uncertainty of a' * b)^2 + ('uncertainty of b' * a)^2 )

Some quantities may require special treatment for the calculation of the difference:

use modulo 360 for longitude and azimuth angles
how to deal with quantities with exponential units (e.g. 'dB')?

We may also want to add differences of intervals (in terms of intersection length):

_intersect: intersect(x,y) = intersection of x_bounds and y_bounds (can also be area intersection when using lat/lon bounds of x and y)
_intersectrelx: intersect(x,y)/length(x)
_intersectrely: intersect(x,y)/length(y)
_intersectrelmin: intersect(x,y)/min(length(x),length(y))
_intersectrelmax: intersect(x,y)/max(length(x),length(y))
_intersectrelavg: 2intersect(x,y)/(length(x)+length(y))
_intersectrelunion: intersect(x,y)/(length(x)+length(y)-intersect(x,y)) = relative to union

svniemeijer commented 6 years ago

We should create this in such a way such that the core function is a C library function that returns a HARP product. We can then also introduce a harp.diff python function that returns the difference of two products/datasets (which uses the same underlying code).

svniemeijer commented 6 years ago

See also wikipedia. It is probably better to use _diffabsrelavg: 2|x-y|/(|x|+|y|). And we might also want to distinguish absolute/signed differences vs. absolute/signed scaling for relative differences. For instance, we might want to use absolute scaling for a signed difference: 2(x-y)/(|x|+|y|)

stcorp / harp

Create harpdiff tool #136