Closed mshr-h closed 1 year ago
The error occurred on the array creation. The code is in _tree_commons.py#L368.
values = np.array([np.zeros(n_classes), values[0], values[0]]
np.zeros(n_classes)
is 1-d array whereas values[0]
is 2-d or 3-d array.
I guess numpy==1.24.0
is a bit stricter about array dimension.
Changing values[0]
to values.reshape(1)
worked fine on my machine.
Hello, with the fix you applied I am getting the error bellow, it happens when calling "fit".
File "/.../hummingbird/ml/operator_converters/_tree_commons.py", line 368, in get_parameters_for_gemm_common values = np.array([np.zeros(n_classes), values.reshape(1), values.reshape(1)]) ValueError: cannot reshape array of size 2 into shape (1,)
@Allecst Can you include a reproducible example?
@mshr-h here's a toy example that breaks the latest version:
import torch
from hummingbird.ml import convert
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV, StratifiedKFold
X = torch.tensor(
[
[0.8912, 0.0675, 0.1040],
[0.7769, 0.5490, 0.4558],
[0.6801, 0.3156, 0.5628],
[0.4307, 0.9626, 0.2438],
[0.7453, 0.6711, 0.1513],
[0.6511, 0.0601, 0.5880],
[0.6508, 0.0597, 0.5854],
],
dtype=torch.float64,
)
y = torch.tensor([0, 1, 1, 1, 1, 0, 0], dtype=torch.int32)
kfolds = StratifiedKFold(n_splits=3)
random_grid = {
"n_estimators": [100],
"max_features": ["sqrt", "log2"],
"max_depth": [10],
"min_samples_split": [2],
"min_samples_leaf": [1, 2],
"bootstrap": [True, False],
}
rfc = RandomForestClassifier(n_jobs=-1)
rfc_random = RandomizedSearchCV(
estimator=rfc,
param_distributions=random_grid,
n_iter=500,
cv=kfolds,
verbose=0,
random_state=1,
scoring="roc_auc",
return_train_score=False,
n_jobs=-1,
)
rfc_random.fit(X, y)
container = convert(
model=rfc_random.best_estimator_,
backend="pytorch",
extra_config={"n_threads": 16},
)
numpy: '1.26.1'
torch: '2.1.0+cu121'
hummingbird: '0.4.11'
sklearn: '1.3.1'
python: '3.10.13 (main, Aug 25 2023, 13:20:03) [GCC 9.4.0]'
that being said, all tests pass as far as I can see so it looks like there's some blind spots :/ (missing coverage for if len(lefts) == 1
perhaps?)
python tests/test_sklearn_decision_tree_converter.py
..........sssss..........ssss...........................................ssssssssssss.
----------------------------------------------------------------------
Ran 85 tests in 33.559s
OK (skipped=21)
I believe the issue lies in _tree_commons.py
, i.e. in get_parameters_for_gemm_common
:
n_classes = values.shape[1]
values = np.array([np.zeros(n_classes), values.reshape(1), values.reshape(1)])
should probably be replaced by:
n_classes = values.shape[-1]
values = np.array([np.zeros(n_classes), values.ravel(), values.ravel()])
(since .ravel()
is faster than .reshape(-1)
)
however, looking at the code it seems get_parameters_for_tree_trav_common
might also be affected?
Thank you @akavalar! Can you please open a new issue with this info?
NumPy 1.24 Release Notes — NumPy v1.25.dev0 Manual
I found the error on the CI run below. Updating Windows runner OS · microsoft/hummingbird@1c27570
Here's test results on my local machine with
numpy==1.24.0
.