jasperroebroek / sklearn-quantile

BSD 3-Clause "New" or "Revised" License
18 stars 1 forks source link

`"weights": "distance"` doesn't work when `n_neighbors` is not big enough #9

Closed kerim371 closed 4 months ago

kerim371 commented 7 months ago

Hi,

I'm trying to work with KNeighborsQuantileRegressor and it works fine with uniform weights but it doesn't work when "weights": "distance" and n_neighbors < 41 and X.shape about 1000.

params_quantile_model = {
    "n_neighbors": 21,      # 7
    "weights": "distance", # uniform
    "metric": "minkowski", # "minkowski",        # l1
    "p": 2,
    "q": [0.5, 0.2, 0.8],
    "n_jobs": -1
}

the error is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[290], line 44
     36 quantile_model = Pipeline(
     37     [
     38         ("scaler", MinMaxScaler()),
     39         ("model", KNeighborsQuantileRegressor(**params_quantile_model)),
     40     ]
     41 )
     42 quantile_model.fit(X_train, y_train)
---> 44 quantile_prediction = quantile_model.predict(X_test)
     45 preds_model = quantile_prediction[0]
     46 preds_model_lower = quantile_prediction[1]

File [~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn/pipeline.py:515](https://vscode-remote+kubeflow-002earamcoinnovations-002ecom.vscode-resource.vscode-cdn.net/home/jovyan/work/overpressure_final/notebooks/~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn/pipeline.py:515), in Pipeline.predict(self, X, **predict_params)
    513 for _, name, transform in self._iter(with_final=False):
    514     Xt = transform.transform(Xt)
--> 515 return self.steps[-1][1].predict(Xt, **predict_params)

File [~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/neighbors/quantile.py:191](https://vscode-remote+kubeflow-002earamcoinnovations-002ecom.vscode-resource.vscode-cdn.net/home/jovyan/work/overpressure_final/notebooks/~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/neighbors/quantile.py:191), in KNeighborsQuantileRegressor.predict(self, X)
    188     weights = np.broadcast_to(weights[:, :, np.newaxis], a.shape)
    190 # this falls back on np.quantile if weights is None
--> 191 y_pred = weighted_quantile(a, q, weights, axis=1)
    193 if self._y.ndim == 1:
    194     y_pred = y_pred[..., 0]

File [~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/utils/weighted_quantile.pyx:339](https://vscode-remote+kubeflow-002earamcoinnovations-002ecom.vscode-resource.vscode-cdn.net/home/jovyan/work/overpressure_final/notebooks/~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/utils/weighted_quantile.pyx:339), in sklearn_quantile.utils.weighted_quantile.weighted_quantile()

File [~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/utils/weighted_quantile.pyx:238](https://vscode-remote+kubeflow-002earamcoinnovations-002ecom.vscode-resource.vscode-cdn.net/home/jovyan/work/overpressure_final/notebooks/~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/utils/weighted_quantile.pyx:238), in sklearn_quantile.utils.weighted_quantile._weighted_quantile_unchecked()

ValueError: NumPy boolean array indexing assignment cannot assign 344 input values to the 1 output values where the mask is true
jasperroebroek commented 7 months ago

Could you create a minimum working example? I tried to recreate this error with random data and on my machine this doesn't happen.

kerim371 commented 7 months ago

@jasperroebroek hi,

Sure, here is the example:

import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn_quantile import KNeighborsQuantileRegressor

N = 2500
X = pd.DataFrame({
    'X': np.random.rand(N),
    'Y': np.random.rand(N),
    'Z': np.random.rand(N),
    'M': np.random.rand(N),
    'target': np.random.rand(N),
})
y = X.pop('target')

params_quantile_model = {
    "n_neighbors": 7,
    "weights": "distance",
    "metric": "minkowski",
    "p": 2,
    "n_jobs": -1
}
params_quantile_model["q"] = [0.5, 0.2, 0.8]
quantile_model = Pipeline(
    [
        ("scaler", MinMaxScaler()),
        ("model", KNeighborsQuantileRegressor(**params_quantile_model)),
    ]
)
quantile_model.fit(X, y)

# on the prediction step we get error
quantile_prediction = quantile_model.predict(X)

gives error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 17
      9 quantile_model = Pipeline(
     10     [
     11         ("scaler", MinMaxScaler()),
     12         ("model", KNeighborsQuantileRegressor(**params_quantile_model)),
     13     ]
     14 )
     15 quantile_model.fit(X, y)
---> 17 quantile_prediction = quantile_model.predict(X)
     18 preds_model = quantile_prediction[0]
     19 preds_model_lower = quantile_prediction[1]

File [~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn/pipeline.py:515](https://vscode-remote+kubeflow-002earamcoinnovations-002ecom.vscode-resource.vscode-cdn.net/home/jovyan/work/overpressure_final/notebooks_deprecated/~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn/pipeline.py:515), in Pipeline.predict(self, X, **predict_params)
    513 for _, name, transform in self._iter(with_final=False):
    514     Xt = transform.transform(Xt)
--> 515 return self.steps[-1][1].predict(Xt, **predict_params)

File [~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/neighbors/quantile.py:191](https://vscode-remote+kubeflow-002earamcoinnovations-002ecom.vscode-resource.vscode-cdn.net/home/jovyan/work/overpressure_final/notebooks_deprecated/~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/neighbors/quantile.py:191), in KNeighborsQuantileRegressor.predict(self, X)
    188     weights = np.broadcast_to(weights[:, :, np.newaxis], a.shape)
    190 # this falls back on np.quantile if weights is None
--> 191 y_pred = weighted_quantile(a, q, weights, axis=1)
    193 if self._y.ndim == 1:
    194     y_pred = y_pred[..., 0]

File [~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/utils/weighted_quantile.pyx:339](https://vscode-remote+kubeflow-002earamcoinnovations-002ecom.vscode-resource.vscode-cdn.net/home/jovyan/work/overpressure_final/notebooks_deprecated/~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/utils/weighted_quantile.pyx:339), in sklearn_quantile.utils.weighted_quantile.weighted_quantile()

File [~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/utils/weighted_quantile.pyx:238](https://vscode-remote+kubeflow-002earamcoinnovations-002ecom.vscode-resource.vscode-cdn.net/home/jovyan/work/overpressure_final/notebooks_deprecated/~/work/overpressure_final/venv/lib/python3.9/site-packages/sklearn_quantile/utils/weighted_quantile.pyx:238), in sklearn_quantile.utils.weighted_quantile._weighted_quantile_unchecked()

ValueError: NumPy boolean array indexing assignment cannot assign 2500 input values to the 5000 output values where the mask is true

Numpy: 1.26.2 Pandas: 2.1.3 Sklearn: 1.3.2 Python: 3.9.5

jasperroebroek commented 4 months ago

Sorry to keep you waiting. To solve this issue took significant changes in the weighed quantile calculations. The latest version, v0.0.32 that I just uploaded to pypi this issue should be solved. Let me know if this error still occurs to you.