microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.32k stars 274 forks source link

AttributeError: 'XGBClassifier' object has no attribute 'raw_operator' #670

Closed dintellect closed 1 year ago

dintellect commented 1 year ago

Code:

hummingmodel = hummingbird.ml.operator_converters.xgb.convert_sklearn_xgb_classifier(model, 'pytorch',extra_config={"n_features":18})

Error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/f2/9tbmpg411hndwc482xn850br0000gn/T/ipykernel_2889/1708670718.py in <module>
      1 # Use Hummingbird to convert the model to PyTorch
----> 2 hummingmodel = hummingbird.ml.operator_converters.xgb.convert_sklearn_xgb_classifier(model, 'pytorch',extra_config={"n_features":18})

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in convert_sklearn_xgb_classifier(operator, device, extra_config)
    102              Please pass "n_features:N" as extra configuration to the converter or fill a bug report.'
    103         )
--> 104     tree_infos = operator.raw_operator.get_booster().get_dump()
    105     n_classes = operator.raw_operator.n_classes_
    106 

AttributeError: 'XGBClassifier' object has no attribute 'raw_operator'

XGB Version: 1.6.1 Hummingbird Version: 0.4.7

Any idea about this issue? What other configurations are required to make this work?

interesaaat commented 1 year ago

Hi! I think you are calling the wrong API. If you want to convert and XGBoost model you can just call hummingbird.ml.convert. Please look at our notebook for an example.

dintellect commented 1 year ago

Also, does the Hummingbird converter only accept all the features in the int datatype? I have my data features in float values as well.

ksaur commented 1 year ago

Yes it also accepts float as you can see in the dataset used in this notebook

dintellect commented 1 year ago
ValueError                                Traceback (most recent call last)
/var/folders/f2/9tbmpg411hndwc482xn850br0000gn/T/ipykernel_2889/1851059335.py in <module>
----> 1 hummingmodel = convert(model, 'pytorch',X_train[0:1])

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in convert(model, backend, test_input, device, extra_config)
    442     """
    443     assert constants.REMAINDER_SIZE not in extra_config
--> 444     return _convert_common(model, backend, test_input, device, extra_config)
    445 
    446 

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in _convert_common(model, backend, test_input, device, extra_config)
    392 
    393     if type(model) in xgb_operator_list:
--> 394         return _convert_xgboost(model, backend_formatted, test_input, device, extra_config)
    395 
    396     if type(model) in lgbm_operator_list:

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in _convert_xgboost(model, backend, test_input, device, extra_config)
    151                 Please pass some test_input to the converter."
    152         )
--> 153     return _convert_sklearn(model, backend, test_input, device, extra_config)
    154 
    155 

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/convert.py in _convert_sklearn(model, backend, test_input, device, extra_config)
    109 
    110     # Convert the Topology object into a PyTorch model.
--> 111     hb_model = topology_converter(topology, backend, test_input, device, extra_config=extra_config)
    112     return hb_model
    113 

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/_topology.py in convert(topology, backend, test_input, device, extra_config)
    220                 )
    221 
--> 222         operator_map[operator.full_name] = converter(operator, device, extra_config)
    223 
    224     # Set the parameters for the model / container

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in convert_sklearn_xgb_classifier(operator, device, extra_config)
    105     n_classes = operator.raw_operator.n_classes_
    106 
--> 107     return convert_gbdt_classifier_common(
    108         operator, tree_infos, _get_tree_parameters, n_features, n_classes, decision_cond="<", extra_config=extra_config
    109     )

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/_gbdt_commons.py in convert_gbdt_classifier_common(operator, tree_infos, get_tree_parameters, n_features, n_classes, classes, extra_config, decision_cond)
     68         tree_infos = [tree_infos[i * n_classes + j] for j in range(n_classes) for i in range(len(tree_infos) // n_classes)]
     69 
---> 70     return convert_gbdt_common(
     71         operator, tree_infos, get_tree_parameters, n_features, classes, extra_config=extra_config, decision_cond=decision_cond
     72     )

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/_gbdt_commons.py in convert_gbdt_common(operator, tree_infos, get_tree_parameters, n_features, classes, extra_config, decision_cond)
     94     assert n_features is not None
     95 
---> 96     tree_parameters, max_depth, tree_type = get_tree_params_and_type(tree_infos, get_tree_parameters, extra_config)
     97 
     98     # Apply learning rate directly on the values rather then at runtime.

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/_tree_commons.py in get_tree_params_and_type(tree_infos, get_tree_parameters, extra_config)
    221         The tree parameters, the maximum tree-depth and the tre implementation to use
    222     """
--> 223     tree_parameters = [get_tree_parameters(tree_info, extra_config) for tree_info in tree_infos]
    224     max_depth = max(1, _find_max_depth(tree_parameters))
    225     tree_type = get_tree_implementation_by_config_or_depth(extra_config, max_depth)

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/_tree_commons.py in <listcomp>(.0)
    221         The tree parameters, the maximum tree-depth and the tre implementation to use
    222     """
--> 223     tree_parameters = [get_tree_parameters(tree_info, extra_config) for tree_info in tree_infos]
    224     max_depth = max(1, _find_max_depth(tree_parameters))
    225     tree_type = get_tree_implementation_by_config_or_depth(extra_config, max_depth)

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in _get_tree_parameters(tree_info, extra_config)
     75         for f_id, f_name in enumerate(feature_names):
     76             tree_info = tree_info.replace(f_name, str(f_id))
---> 77     _tree_traversal(
     78         tree_info.replace("[f", "").replace("[", "").replace("]", "").split(), lefts, rights, features, thresholds, values
     79     )

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in _tree_traversal(tree_info, lefts, rights, features, thresholds, values)
     31             count += 1
     32         else:
---> 33             features.append(int(tree_info[count].split(":")[1].split("<")[0].replace("[f", "")))
     34             thresholds.append(float(tree_info[count].split(":")[1].split("<")[1].replace("]", "")))
     35             values.append([-1])

ValueError: invalid literal for int() with base 10: 'subdomain'
​
# hummingmodel = hummingbird.ml.operator_converters.xgb.convert_sklearn_xgb_classifier(model, 'pytorch',extra_config="n_features:12")
​
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/var/folders/f2/9tbmpg411hndwc482xn850br0000gn/T/ipykernel_2889/3926699730.py in <module>
----> 1 hummingmodel = hummingbird.ml.operator_converters.xgb.convert_sklearn_xgb_classifier(model, 'pytorch',extra_config="n_features:12")

~/opt/anaconda3/lib/python3.9/site-packages/hummingbird/ml/operator_converters/xgb.py in convert_sklearn_xgb_classifier(operator, device, extra_config)
     96     assert operator is not None, "Cannot convert None operator"
     97     if "n_features" in extra_config:
---> 98         n_features = extra_config["n_features"]
     99     else:
    100         raise RuntimeError(

ValueError: invalid literal for int() with base 10: 'subdomain'

My XGBoost model is trained correctly without any error but the hummingbird is throwing this error while converting.

interesaaat commented 1 year ago

can you try to convert the mode without extra_conf? Something like convert(model, 'torch').

dintellect commented 1 year ago

hummingmodel = convert(model, 'torch')

I tried this and it's throwing the ValueError

I am passing the Dataframe.

dtypes: float64(5), int64(13)

interesaaat commented 1 year ago

Are you doing something differently than in this test?

dintellect commented 1 year ago

The only difference is that my X and Y features are in the Dataframe format and I am not passing any extra_config. Also, I am using a Label encoder to encode a few of my features.

model = XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=9,
min_child_weight=6, gamma=0.3, subsample=0.8, colsample_bytree=0.8,
objective= 'binary:logistic',reg_lambda=0.1, nthread=4, scale_pos_weight=4,n_jobs=4,verbosity=1)

My model is trained successfully without any errors but conversion is throwing value error.

dintellect commented 1 year ago

Any update on this?

interesaaat commented 1 year ago

I am running out of ideas. Can you post a minimal script (if possible) with some dummy data to reproduce the errow? I can try to run it myself and see what is happening.

ksaur commented 1 year ago

We cannot reproduce your bug. Can you share some of your data?

from hummingbird.ml import convert
import xgboost as xgb
model = xgb.XGBClassifier( learning_rate =0.1, n_estimators=177, max_depth=9, min_child_weight=6, gamma=0.3, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic',reg_lambda=0.1, nthread=4, scale_pos_weight=4,n_jobs=4,verbosity=1)
import numpy as np
num_classes = 2
X = np.random.rand(1000, 28)  # these are floats
y = np.random.randint(num_classes, size=1000)
model.fit(X, y)
hb_model = convert(model, 'torch', X[0:1])
>>> print(hb_model)
<hummingbird.ml.containers.sklearn.pytorch_containers.PyTorchSklearnContainerClassification object at 0x7fbcda4fafa0>
>>> xgb.__version__
'1.6.1'

Can you run the above code?

dintellect commented 1 year ago

I got the issue, the X and y need to be converted into a Numpy array instead of passing it as a Dataframe before training. Otherwise, it will throw this error. When I converted my datasets into NumPy arrays Hummingbird finally converts them. Thanks a lot for your help!

dintellect commented 1 year ago

Before closing this thread, I have one quick question does Hummingbird provides the support in transformation Scikit learn pipeline too?

interesaaat commented 1 year ago

Ok glad that it worked at the end. But Hummingbird should also work with Pandas Dataframe inputs. For example, this test uses Pandas Dataframes as inputs.

interesaaat commented 1 year ago

Yes we also support Sklearn pipelines. Look here for an example on how to use it.

dintellect commented 1 year ago

Not sure what is the issue with the Dataframes, I can definitely provide you with a sample of the dataset to reproduce the error.

ksaur commented 1 year ago

Feel free to look at our examples of it working with dataframes and open a new one (with code and a full example) if you still have issues. Thanks!