Weighted quantile calculation is incorrect

david-cortes commented 3 months ago

What happened?

DataArrayWeighted.quantile returns incorrect numbers for weighted quantiles.

As an example, passing a quantile of 1 should return the maximum value by definition, but when there are weights, it will oftentimes return a different number.

What did you expect to happen?

A quantile of 1 should return the maximum value. Note that the calculation is also incorrect for lower quantiles, not just for q=1.

Minimal Complete Verifiable Example

import numpy as np
import xarray

(
    xarray.DataArray(
        np.array([1,10,100])
    )
    .weighted(
        xarray.DataArray(
            np.array([10,10,5])
        )
    )
    .quantile(
        1.
    )
    .to_numpy()
)

array(60.)

MVCE confirmation

[X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
[X] Complete example — the example is self-contained, including all data and the text of any traceback.
[X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
[X] New issue — a search of GitHub Issues suggests this is not a duplicate.
[X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 (main, Mar 1 2023, 18:26:19) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 6.1.0-23-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None xarray: 2023.6.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.9.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.8 dask: 2023.6.0 distributed: 2023.6.0 matplotlib: 3.7.2 cartopy: None seaborn: 0.12.2 numbagg: None fsspec: 2023.4.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2.1 conda: 23.7.3 pytest: 7.4.0 mypy: 1.6.1 IPython: 8.12.2 sphinx: 5.0.2

welcome[bot] commented 3 months ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

max-sixty commented 3 months ago

That does look wrong, thanks for flagging.

I had a quick look but couldn't immediately see the issue. Tagging @cjauvin & @huard as the authors in case they want to have a look.

huard commented 3 months ago

The algorithm has no special logic for q=0 or 1. It uses the "linear" (type 7) interpolation method.

I understand the logic for setting q=1 to the max of the sample, but that would imply that weights have no effect on the results. I suspect we'd get discontinuities as we go from q=0.9999 (where weights would matter) to q=1 where we'd ignore weights.

All in all, I welcome improvements to the algo used here, but I'm a doubtful we're talking about a simple quick fix.

david-cortes commented 3 months ago

The algorithm has no special logic for q=0 or 1. It uses the "linear" (type 7) interpolation method.

I understand the logic for setting q=1 to the max of the sample, but that would imply that weights have no effect on the results. I suspect we'd get discontinuities as we go from q=0.9999 (where weights would matter) to q=1 where we'd ignore weights.

All in all, I welcome improvements to the algo used here, but I'm a doubtful we're talking about a simple quick fix.

Calculations also look incorrect for other quantiles like 0.5 or 0.7. For example:

import numpy as np
import xarray

(
    xarray.DataArray(
        np.array([1,10,100])
    )
    .weighted(
        xarray.DataArray(
            np.array([10,10,5])
        )
    )
    .quantile(
        np.linspace(0,1,11)
    )
    .to_numpy()
)

array([ 1. ,  1.6,  3.2,  4.8,  6.4,  8. ,  9.6, 12. , 28. , 44. , 60. ])

... from which one can see that quantiles 0.1 and 0.2 are also incorrect, for example.

As a reference point, I think the package wquantiles also works by linear interpolation, and the results look reasonable at first sight (haven't verified them):

from wquantiles import quantile
[
    quantile(
        np.array([1,10,100]),
        np.array([10,10,5]),
        q
    )
    for q in np.linspace(0,1,11)
]

[1.0,
 1.0,
 1.0,
 3.250000000000001,
 5.500000000000001,
 7.750000000000001,
 10.000000000000034,
 40.00000000000002,
 70.0,
 100.0,
 100.0]

huard commented 3 months ago

I'm not sure I understand why those results are incorrect. Our reference implementation can be found here: https://aakinshin.net/weighted-quantile-estimators/ and there is a unit test comparing results with those obtained from running the reference implementation.

My understanding is that there are multiple ways to compute weighted quantiles, and we picked one that looked reasonable, but I'm not knowledgeable enough to argue about the merits of the different algorithms.

If there's a bug in our implementation of the reference algo, then let's find it and fix it.

max-sixty commented 3 months ago

I'm no expert and haven't read the paper in full, but it does seem really surprising that the values for a 1.0 quantile (or 0.99) would be 60 rather than 100 for data (1, 10, 100) with weights (10, 10, 5)...

I would suspect the median value would be lower than 10 given 100 has less weight, but it would linearly approach 100 in the highest 5/25 of the quantiles. I agree there shouldn't be special-casing of 0 or 1. From a distance, it looks like something hasn't been normalized correctly.

If anyone has easy access to an R environment, would they be up for testing what the results of this case with the code in https://aakinshin.net/weighted-quantile-estimators/ ?

david-cortes commented 3 months ago

I'm not sure I understand why those results are incorrect. Our reference implementation can be found here: https://aakinshin.net/weighted-quantile-estimators/ and there is a unit test comparing results with those obtained from running the reference implementation.

My understanding is that there are multiple ways to compute weighted quantiles, and we picked one that looked reasonable, but I'm not knowledgeable enough to argue about the merits of the different algorithms.

If there's a bug in our implementation of the reference algo, then let's find it and fix it.

If you want an even easier reference and possible unit test, the result from weighted quantiles with integer weights should roughly match with what you'd get from repeating each input the number of times specified by its weight. You can see that the xarray outputs in that case would be very different:

import numpy as np
import xarray

(
    xarray.DataArray(
        np.array([1]*10 + [10]*10 + [100]*5)
    )
    .quantile(
        np.linspace(0,1,11)
    )
    .to_numpy()
)

array([  1. ,   1. ,   1. ,   1. ,   6.4,  10. ,  10. ,  10. ,  28. ,
       100. , 100. ])

huard commented 3 months ago

Got it. I remember there was a discussion somewhere about the various interpretations of weighted quantiles, one being a repetition of samples (frequency weights, numpy's fweights), and the other being the "mass" of samples (aweights), and we ended up selecting the later, because it was closer in meaning to other weighted calculations offered by xarray (if I remember correctly).

pydata / xarray