stcorp / harp

Data harmonization toolset for scientific earth observation data
http://stcorp.github.io/harp/doc/html/index.html
BSD 3-Clause "New" or "Revised" License
55 stars 18 forks source link

*_uncertainty_random after spatial binning: missing division by total weight? #302

Closed tdanckaert closed 6 months ago

tdanckaert commented 7 months ago

Hi,

it seems the result of *_uncertainty_random after spatial binning does not behave as expected given the formula in https://stcorp.github.io/harp/doc/html/algorithms/regridding.html#spatial-binning (expected: a reduction by ~1/srqrt(number of observations) when compared to the random error on a single observation). For example, we've noticed that the random uncertainty for pixels in the overlap of two orbits tends to increase, rather than decrease.

Some tests by my colleagues indicate that we do get the desired behaviour, if we divide the binned *uncertainty_random by the corresponding weight values provided by Harp. Looking at the code in https://github.com/stcorp/harp/blob/master/libharp/harp-binning.c#L2688, it seems like the final division by the total weight is only performed if (bintype[k] == binning_average), and not for bintype binning_uncertainty, i.e. the denominator from the formula in the algorithm documentation is missing. Is it possible that this is an omission harp-binning.c? (by the way, I think an extra variable would be needed to keep track of the total weight in this case, as the variable weight[] for the binning_uncertainty case contains the sum of the squared sample_weights?)

svniemeijer commented 7 months ago

You are right. We are dividing the random uncertainty for the weight correctly for the bin() operation (https://github.com/stcorp/harp/blob/master/libharp/harp-binning.c#L2008), but not for the bin_spatial() operation. We will fix this.

I am not sure I understand your weight variable request, because we still divide by the sum of the weights (not the sum of their squares).

tdanckaert commented 7 months ago

I am not sure I understand your weight variable request, because we still divide by the sum of the weights (not the sum of their squares).

It was just a suggestion because I had the impression that due to

if (bintype[k] == binning_uncertainty) { sample_weight *= sample_weight; } on line 2598, and weight[target_index * num_sub_elements + j] += sample_weight; on line 2622, the weight[] variable would hold the sum of the squared sample weights in the binning_uncertainty case. Either way, thank you for looking into it!

svniemeijer commented 7 months ago

That seems like a bug as well. We should not be dividing by (and storing) the sum of the squared weights. We will fix this.

svniemeijer commented 6 months ago

Fixed in 174ee4b0110c3fffc08809dd4fb980f61e02bb67 and 875a2eba8ff8edbc6da3f0249cbdc5e3b172e8b0