onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
557 stars 104 forks source link

Issue using String features with Lightgbm #984

Open aleeminati opened 1 year ago

aleeminati commented 1 year ago

Hi!

I am trying to convert lightgbm model to ONNX but I'm using string features as input. As per my understanding, we can only use Float inputs to a lightgbm model if want to convert it to ONNX. I use ordinalencoder in sklearn pipeline to mediate this issue. So, I have a sklearn pipeline with ordinal encoder followed by lightgbm model. The pipeline fits fine and predicts fine but when I use convert_sklearn(), I'm getting cannot convert 'A3817D' to float. That is a categoric feature which should be ordinally encoded. Why am I getting this error?

Example code (original code is in remote environment): model = LGBMClassifier() scaler = OrdinalEncoder()

pipe = Pipeline(steps =[ ('scaler', scaler), ('model', model)])

pipe.fit(X_train, y_train) #Works fine

update_registered_converters(...) model_onnx = convert_sklearn(pipe, 'pipeline', [('input', FloatTensorType(X_train.shape[0]))]) # runs into an error

Error: Cannot convert string to float: 'A3817D'

Any help will be appreciated.

xadupre commented 1 year ago

I assume your data is made of string not float. You should replace FloatTensorType by StringTensorType.

aleeminati commented 1 year ago

Problem is that LightGBM doesn't accept StringTensorType. I was able to solve it by creating two Models. One with only the data pipeline and one with lightgbm. I update the converters and convert them into ONNX separately and then later merge them together using merge_models

ArlanCooper commented 2 weeks ago

does anyone has a solvtion?