Closed dintellect closed 1 year ago
Hi! I think you are calling the wrong API. If you want to convert and XGBoost model you can just call hummingbird.ml.convert
. Please look at our notebook for an example.
Also, does the Hummingbird converter only accept all the features in the int datatype? I have my data features in float values as well.
Yes it also accepts float as you can see in the dataset used in this notebook
ValueError Traceback (most recent call last)
/var/folders/f2/9tbmpg411hndwc482xn850br0000gn/T/ipykernel_2889/1851059335.py in <module>
----> 1 hummingmodel = convert(model, 'pytorch',X_train[0:1])
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in convert(model, backend, test_input, device, extra_config)
442 """
443 assert constants.REMAINDER_SIZE not in extra_config
--> 444 return _convert_common(model, backend, test_input, device, extra_config)
445
446
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in _convert_common(model, backend, test_input, device, extra_config)
392
393 if type(model) in xgb_operator_list:
--> 394 return _convert_xgboost(model, backend_formatted, test_input, device, extra_config)
395
396 if type(model) in lgbm_operator_list:
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in _convert_xgboost(model, backend, test_input, device, extra_config)
151 Please pass some test_input to the converter."
152 )
--> 153 return _convert_sklearn(model, backend, test_input, device, extra_config)
154
155
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in _convert_sklearn(model, backend, test_input, device, extra_config)
109
110 # Convert the Topology object into a PyTorch model.
--> 111 hb_model = topology_converter(topology, backend, test_input, device, extra_config=extra_config)
112 return hb_model
113
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_topology.py in convert(topology, backend, test_input, device, extra_config)
220 )
221
--> 222 operator_map[operator.full_name] = converter(operator, device, extra_config)
223
224 # Set the parameters for the model / container
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in convert_sklearn_xgb_classifier(operator, device, extra_config)
105 n_classes = operator.raw_operator.n_classes_
106
--> 107 return convert_gbdt_classifier_common(
108 operator, tree_infos, _get_tree_parameters, n_features, n_classes, decision_cond="<", extra_config=extra_config
109 )
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/_gbdt_commons.py in convert_gbdt_classifier_common(operator, tree_infos, get_tree_parameters, n_features, n_classes, classes, extra_config, decision_cond)
68 tree_infos = [tree_infos[i * n_classes + j] for j in range(n_classes) for i in range(len(tree_infos) // n_classes)]
69
---> 70 return convert_gbdt_common(
71 operator, tree_infos, get_tree_parameters, n_features, classes, extra_config=extra_config, decision_cond=decision_cond
72 )
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/_gbdt_commons.py in convert_gbdt_common(operator, tree_infos, get_tree_parameters, n_features, classes, extra_config, decision_cond)
94 assert n_features is not None
95
---> 96 tree_parameters, max_depth, tree_type = get_tree_params_and_type(tree_infos, get_tree_parameters, extra_config)
97
98 # Apply learning rate directly on the values rather then at runtime.
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/_tree_commons.py in get_tree_params_and_type(tree_infos, get_tree_parameters, extra_config)
221 The tree parameters, the maximum tree-depth and the tre implementation to use
222 """
--> 223 tree_parameters = [get_tree_parameters(tree_info, extra_config) for tree_info in tree_infos]
224 max_depth = max(1, _find_max_depth(tree_parameters))
225 tree_type = get_tree_implementation_by_config_or_depth(extra_config, max_depth)
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/_tree_commons.py in <listcomp>(.0)
221 The tree parameters, the maximum tree-depth and the tre implementation to use
222 """
--> 223 tree_parameters = [get_tree_parameters(tree_info, extra_config) for tree_info in tree_infos]
224 max_depth = max(1, _find_max_depth(tree_parameters))
225 tree_type = get_tree_implementation_by_config_or_depth(extra_config, max_depth)
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in _get_tree_parameters(tree_info, extra_config)
75 for f_id, f_name in enumerate(feature_names):
76 tree_info = tree_info.replace(f_name, str(f_id))
---> 77 _tree_traversal(
78 tree_info.replace("[f", "").replace("[", "").replace("]", "").split(), lefts, rights, features, thresholds, values
79 )
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in _tree_traversal(tree_info, lefts, rights, features, thresholds, values)
31 count += 1
32 else:
---> 33 features.append(int(tree_info[count].split(":")[1].split("<")[0].replace("[f", "")))
34 thresholds.append(float(tree_info[count].split(":")[1].split("<")[1].replace("]", "")))
35 values.append([-1])
ValueError: invalid literal for int() with base 10: 'subdomain'
# hummingmodel = hummingbird.ml.operator_converters.xgb.convert_sklearn_xgb_classifier(model, 'pytorch',extra_config="n_features:12")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/f2/9tbmpg411hndwc482xn850br0000gn/T/ipykernel_2889/3926699730.py in <module>
----> 1 hummingmodel = hummingbird.ml.operator_converters.xgb.convert_sklearn_xgb_classifier(model, 'pytorch',extra_config="n_features:12")
~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in convert_sklearn_xgb_classifier(operator, device, extra_config)
96 assert operator is not None, "Cannot convert None operator"
97 if "n_features" in extra_config:
---> 98 n_features = extra_config["n_features"]
99 else:
100 raise RuntimeError(
ValueError: invalid literal for int() with base 10: 'subdomain'
My XGBoost model is trained correctly without any error but the hummingbird is throwing this error while converting.
can you try to convert the mode without extra_conf
? Something like convert(model, 'torch')
.
hummingmodel = convert(model, 'torch')
I tried this and it's throwing the ValueError
I am passing the Dataframe.
dtypes: float64(5), int64(13)
Are you doing something differently than in this test?
The only difference is that my X and Y features are in the Dataframe format and I am not passing any extra_config. Also, I am using a Label encoder to encode a few of my features.
model = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=9,
min_child_weight=6, gamma=0.3, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic',reg_lambda=0.1, nthread=4, scale_pos_weight=4,n_jobs=4,verbosity=1)
My model is trained successfully without any errors but conversion is throwing value error.
Any update on this?
I am running out of ideas. Can you post a minimal script (if possible) with some dummy data to reproduce the errow? I can try to run it myself and see what is happening.
We cannot reproduce your bug. Can you share some of your data?
from hummingbird.ml import convert
import xgboost as xgb
model = xgb.XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=9, min_child_weight=6, gamma=0.3, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic',reg_lambda=0.1, nthread=4, scale_pos_weight=4,n_jobs=4,verbosity=1)
import numpy as np
num_classes = 2
X = np.random.rand(1000, 28) # these are floats
y = np.random.randint(num_classes, size=1000)
model.fit(X, y)
hb_model = convert(model, 'torch', X[0:1])
>>> print(hb_model)
<hummingbird.ml.containers.sklearn.pytorch_containers.PyTorchSklearnContainerClassification object at 0x7fbcda4fafa0>
>>> xgb.__version__
'1.6.1'
Can you run the above code?
I got the issue, the X and y need to be converted into a Numpy array instead of passing it as a Dataframe before training. Otherwise, it will throw this error. When I converted my datasets into NumPy arrays Hummingbird finally converts them. Thanks a lot for your help!
Before closing this thread, I have one quick question does Hummingbird provides the support in transformation Scikit learn pipeline too?
Ok glad that it worked at the end. But Hummingbird should also work with Pandas Dataframe inputs. For example, this test uses Pandas Dataframes as inputs.
Yes we also support Sklearn pipelines. Look here for an example on how to use it.
Not sure what is the issue with the Dataframes, I can definitely provide you with a sample of the dataset to reproduce the error.
Feel free to look at our examples of it working with dataframes and open a new one (with code and a full example) if you still have issues. Thanks!
Code:
hummingmodel = hummingbird.ml.operator_converters.xgb.convert_sklearn_xgb_classifier(model, 'pytorch',extra_config={"n_features":18})
Error:
XGB Version: 1.6.1 Hummingbird Version: 0.4.7
Any idea about this issue? What other configurations are required to make this work?