onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
557 stars 104 forks source link

ONNXRuntimeError] : 1 : FAIL : Load model from /kaggle/working/model.onnx failed:Type Error: Type (tensor(double)) of output arg (variable) of node (Co_Concat) does not match expected type (tensor(float)). #1102

Closed DiTo97 closed 5 months ago

DiTo97 commented 5 months ago

when converting the following model:

from typing import Any
import os 
import pandas as pd
import dask.dataframe as dd
import numpy as np
from warnings import simplefilter

import lightgbm
import numpy
import onnxruntime
import optuna
import skl2onnx
from optuna.pruners import SuccessiveHalvingPruner
from onnxmltools.convert.lightgbm.operator_converters.LightGbm import convert_lightgbm
from skl2onnx import update_registered_converter, to_onnx
from skl2onnx.common import data_types, shape_calculator
from sklearn import (
    base,
    compose,
    datasets,
    ensemble,
    linear_model,
    model_selection,
    multioutput,
    pipeline,
    preprocessing
)
from sklearn.metrics import make_scorer, mean_squared_error
from sklearn.model_selection import GroupKFold
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler

def Normalizer() -> list[tuple[str, Any]]:
    return [
        ("cast64", skl2onnx.sklapi.CastTransformer(dtype=numpy.float64)),
        ("scaler", preprocessing.StandardScaler()),
        ("cast32", skl2onnx.sklapi.CastTransformer(dtype=numpy.float32))
    ]

def Embedder(**kwargs: dict[str, Any]) -> list[tuple[str, Any]]:
    return [
        ("basemodel", lightgbm.LGBMRegressor(**kwargs))
    ]

def BoL2EmotionV2(
    backbone_kwargs: dict[str, Any] | None = None,
) -> base.BaseEstimator:

    backbone = Embedder(**backbone_kwargs)

    normalizer= Normalizer()

    model = pipeline.Pipeline([*normalizer, *backbone])

    return multioutput.MultiOutputRegressor(model)

to ONNX with the following conversion code:

update_registered_converter(
    lightgbm.LGBMRegressor,
    "LightGbmLGBMRegressor",
    shape_calculator.calculate_linear_regressor_output_shapes,
    convert_lightgbm,
    options={"split": None},
)

sample = X

exported = to_onnx(
    model,
    X=numpy.asarray(sample).astype(numpy.float32),
    name="BoL2emotion",
    target_opset={"": 19, "ai.onnx.ml": 2}
)

with open("model.onnx", "wb") as f:
    f.write(exported.SerializeToString())

the inference session from the exported model works just fine and is well calibrated with the original model:

modelengine = onnxruntime.InferenceSession(
   exported.SerializeToString(), providers=["CPUExecutionProvider"]
)

but loading the model from file results in the error in the title; what might be the problem?

model = onnx.load("/kaggle/working/model.onnx")

or

modelengine = onnxruntime.InferenceSession(
   "model.onnx", providers=["CPUExecutionProvider"]
)
xadupre commented 5 months ago

I assume model = BoL2EmotionV2()?

DiTo97 commented 5 months ago

I assume model = BoL2EmotionV2()?

@xadupre that's right, I forgot to mention it

xadupre commented 5 months ago

I tried to replicate but it seems to work for me. The code I wrote is here: #1103. Feel free to add comment if I did something wrong.

DiTo97 commented 5 months ago

@xadupre, appreciate the code replication.

The code you wrote works for me as well, likely didn't do a great job at explaining it in the original post.

The problem arises when we save the exported model to a .onnx file and try to instantiate the inference session from the exported file at a later time.

To be clearer, snippet 1 works, snippet 2 does not:

.
.
.

exported = to_onnx(model, ...)

modelengine = onnxruntime.InferenceSession(
    exported.SerializeToString(), providers=["CPUExecutionProvider"]
)
.
.
.

exported = to_onnx(model, ...)

with open("model.onnx", "wb") as f:
    f.write(exported.SerializeToString())

modelengine = onnxruntime.InferenceSession(
   "model.onnx", providers=["CPUExecutionProvider"]
)
xadupre commented 5 months ago

I tried to replicate but it still working for me. Saving the model on disk should not change anything. The error comes from shape_inference. This is the model I get.

image

Can you run the updated unit test? I need to know the version you are using as well (onnx, scikit-learn onnxmltools, lightgbm, onnxruntime)

DiTo97 commented 5 months ago

@xadupre will replicate the unit test in our own environment and report back ASAP!

@andreaGiacolono is in charge of the experiments.

andreaGiacolono commented 5 months ago

Hi @xadupre, I replicated the unit test and it gave me this error message:

/root/ (unittest.loader._FailedTest) ... ERROR
======================================================================
ERROR: /root/ (unittest.loader._FailedTest)
----------------------------------------------------------------------
AttributeError: module '__main__' has no attribute '/root/'
----------------------------------------------------------------------
Ran 1 test in 0.004s
FAILED (errors=1)

An exception has occurred, use %tb to see the full traceback.
SystemExit: True

/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py:3561: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

There are the versions: onnx 1.16.1 scikit-learn 1.2.2 lightgbm 4.2.0 onnxmltools 1.12.0 onnxruntime 1.18.0

xadupre commented 5 months ago

I forgot to ask about the version of numpy. Did you run just one test or a couple of them? It seems the way you run the test is not right. Can you do pytest <test_file>?

andreaGiacolono commented 5 months ago

The numpy version is 1.26.4, i'll try with pytest

xadupre commented 5 months ago

scikit-learn using double by default. onnx uses float by default and is strongly typed but the type is used by the converter to guess the output type. You can read this https://onnx.ai/sklearn-onnx/auto_examples/plot_cast_transformer.html, inserting CastTransformer in your pipeline should fix it.

andreaGiacolono commented 5 months ago

We tried to run again the export and the InferenceSession and it seems to be working. For now we can close the issue but we continue to monitor. Thank you!

xadupre commented 5 months ago

:)