microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.05k stars 2.83k forks source link

XGBoost incremental training, issue with ONNX Conversion #18841

Open kiransarv opened 8 months ago

kiransarv commented 8 months ago

Describe the issue

Trained an XGBoost with incremental learning.

    batch_size = 1024
    print(vectors.shape, labels.shape, len(np.unique(labels)))
    self.model: XGBClassifier = XGBClassifier(**self.init_param)
    for start in range(0, vectors.shape[0], batch_size):
        itr_vector = vectors[start : start + batch_size]
        itr_label = labels[start : start + batch_size]
        if start == 0:
           self.model.fit(itr_vector, itr_label, **fit_params)
        else:
           fit_params["xgb_model"] = self.model
           self.model.fit(itr_vector, itr_label, **fit_params)

facing an issue with ONNX model RUNTIME_EXCEPTION : Non-zero status code returned while running TreeEnsembleClassifier node. Name:'TreeEnsembleClassifier' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/ml/tree_ensemble_aggregator.h:201 void onnxruntime::ml::detail::TreeAggregatorSum<InputType, ThresholdType, OutputType>::ProcessTreeNodePrediction(onnxruntime::InlinedVector<onnxruntime::ml::detail::ScoreValue >&, const onnxruntime::ml::detail::TreeNodeElement&, gsl::span<const onnxruntime::ml::detail::SparseValue >) const [with InputType = float; ThresholdType = float; OutputType = float; onnxruntime::InlinedVector<onnxruntime::ml::detail::ScoreValue > = absl::lts_20220623::InlinedVector<onnxruntime::ml::detail::ScoreValue, 6, std::allocator<onnxruntime::ml::detail::ScoreValue > >] it->i < (int64_t)predictions.size() was false.

if not incremental model, only fitting one time self.model.fit(vectors, labels, **fit_params) No issue with ONNX model, predictions are working fine.

To reproduce

Steps are detailed above.

Urgency

No response

Platform

Mac

OS Version

MacOS Ventura

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

baijumeswani commented 8 months ago

@xadupre would you please help with this issue?

xadupre commented 8 months ago

This error means that a leaf returns a class index outside the expected number of classes. The attribute classlabels_int64s probably shorter than max(class_ids) but I wonder why it would happen. I'll need to know the version you used to train and convert the model (version of xgboost and onnxmltools).

kiransarv commented 8 months ago

XGBoost version 2.0.2 ONNX Version 1.16.1

xadupre commented 8 months ago

What about onnxmltools?

kiransarv commented 8 months ago

onnxmltools 1.11.2

xadupre commented 8 months ago

Is it possible to try with 1.12.0? We released it last month. It fixes some bugs with xgboost >= 2.0.

kiransarv commented 8 months ago

Sure Thanks...

kiransarv commented 8 months ago

Same error even after upgrading

xadupre commented 8 months ago

Thanks for trying. I'll try to replicate your issue unless you already have a full script to share.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

addisonklinke commented 1 month ago

the attribute classlabels_int64s probably shorter than max(class_ids)

Thanks for the tip @xadupre. I've been trying to convert a PySpark XGBoost model, and because it doesn't have .classes_ from the sklearn implementation I had to fill that attribute myself. Initially I had the column names hardcoded and then realized I was fitting on one and setting the attribute with another which would indeed lead to len(classlabels_int64s) != max(class_ids)