microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.33k stars 274 forks source link

[ONNX Converter] Allow for specification of ONNX Initializers #433

Open luisquintanilla opened 3 years ago

luisquintanilla commented 3 years ago

I have a regression model trained and converted to ONNX using ML.NET. When I take the ONNX model and try to optimize it using Hummingbird I get an error because of incompatible operators / data types.

Given the following Python code:

import onnx
import numpy as np
from hummingbird.ml import convert, constants

# Define model path
model_path = "taxi-fare.onnx"

# Load ONNX model
onnx_model = onnx.load_model(model_path)

# Define sample input
input = np.array([("CMT",1.0,1.0,1.0,1.0,"CRD",1.0)])

# Convert to HB-ONNX
hb_onnx = convert(onnx_model,"onnx", test_input=input)

# Save HB-ONNX model
hb_onnx.save("hb-taxi-fare")

The result is the following.

Traceback (most recent call last):
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/_topology.py", line 153, in convert
    converter = get_converter(operator.type)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/onnxconverter_common/registration.py", line 40, in get_converter
    raise ValueError('Unsupported conversion for operator %s' % operator_name)
ValueError: Unsupported conversion for operator None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "convert.py", line 21, in <module>
    hb_onnx = convert(onnx_model,"onnx", test_input=test_data)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/convert.py", line 431, in convert
    return _convert_common(model, backend, test_input, device, extra_config)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/convert.py", line 387, in _convert_common
    return _convert_onnxml(model, backend, test_input, device, extra_config)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/convert.py", line 260, in _convert_onnxml
    hb_model = topology_converter(topology, backend, test_input, device, extra_config=extra_config)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/_topology.py", line 165, in convert
    raise MissingConverter(
hummingbird.ml.exceptions.MissingConverter: Unable to find converter for None type <class 'NoneType'> with extra config: {'test_input': array([['CMT', '1.0', '1.0', '1.0', '1.0', 'CRD', '1.0']], dtype='<U3'), 'container': True, 'n_threads': 4, 'n_features': 7, 'onnx_initializers': {'uint64': data_type: 12
name: "uint64"
uint64_data: 16
, 'int64': data_type: 7
int64_data: 1
name: "int64"
, 'uint640': data_type: 12
name: "uint640"
uint64_data: 16
, 'int640': data_type: 7
int64_data: 1
name: "int640"
, 'mlnet.vendor_id.SlotNames': dims: 1
dims: 1
data_type: 8
string_data: "one"
name: "mlnet.vendor_id.SlotNames"
, 'mlnet.payment_type.SlotNames': dims: 1
dims: 1
data_type: 8
string_data: "one"
name: "mlnet.payment_type.SlotNames"
, 'mlnet.Features.SlotNames': dims: 1
dims: 1
data_type: 8
string_data: "one"
name: "mlnet.Features.SlotNames"
}, 'tree_implementation': 'tree_trav', 'max_string_length': 4}.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter implemented.
Please fill an issue at https://github.com/microsoft/hummingbird.

It's unclear how to specify ONNX Initializers. I see how it's done in sklearn. Given my data sample above, is there a way for me to specify the initializers once I already have an ONNX model? In ML.NET I don't believe there's an option to specify the initial types ahead of time. It'd be great to be able to specify these after the fact. I see it might be possible via the extra_config parameter. But given the output, it's not clear what the type or format of it should be.

Attached is the ONNX model

taxi-fare.zip

interesaaat commented 3 years ago

Thanks @luisquintanilla for reporting this. I think that the problem is that we don't currently support strings from onnx models. With sklearn models we do, so it is just a matter of enabling this. I will work on this.

luisquintanilla commented 3 years ago

Thanks for the quick response @interesaaat . I'll give it another try using only numerical values to check whether I run into the same issue.

luisquintanilla commented 3 years ago

No luck using only numbers, here's the updated code. and model. In this case, I think the issue is, the Label column. I'm using a label encoder because although it's a number, it's a categorical value.

Code:

import onnx
import numpy as np
from hummingbird.ml import convert, constants

# Define model path
model_path = "iris-classification.onnx"

# Load ONNX model
onnx_model = onnx.load_model(model_path)

# Define sample input
input = np.array([(1.0,1.0,1.0,1.0,1.0)])

# Convert to HB-ONNX
hb_onnx = convert(onnx_model,"onnx",test_input=input)

# Save HB-ONNX model
hb_onnx.save("hb-taxi-fare")

Console Output:

Traceback (most recent call last):
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/_topology.py", line 153, in convert
    converter = get_converter(operator.type)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/onnxconverter_common/registration.py", line 40, in get_converter
    raise ValueError('Unsupported conversion for operator %s' % operator_name)
ValueError: Unsupported conversion for operator None

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "convert.py", line 15, in <module>
    hb_onnx = convert(onnx_model,"onnx",test_input=input)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/convert.py", line 431, in convert
    return _convert_common(model, backend, test_input, device, extra_config)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/convert.py", line 387, in _convert_common
    return _convert_onnxml(model, backend, test_input, device, extra_config)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/convert.py", line 260, in _convert_onnxml
    hb_model = topology_converter(topology, backend, test_input, device, extra_config=extra_config)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/_topology.py", line 165, in convert
    raise MissingConverter(
hummingbird.ml.exceptions.MissingConverter: Unable to find converter for None type <class 'NoneType'> with extra config: {'test_input': array([[1., 1., 1., 1., 1.]]), 'container': True, 'n_threads': 4, 'n_features': 5, 'onnx_initializers': {'one': data_type: 1
float_data: 1.0
name: "one"
, 'oneInt': data_type: 6
int32_data: 1
name: "oneInt"
, 'zero': data_type: 1
float_data: 0.0
name: "zero"
, 'labelCount': data_type: 1
float_data: 3.0
name: "labelCount"
, 'totalTrainingCount': data_type: 1
float_data: 150.0
name: "totalTrainingCount"
, 'labelHistogram': dims: 3
dims: 1
data_type: 1
float_data: 50.0
float_data: 50.0
float_data: 50.0
name: "labelHistogram"
, 'featureHistogram': dims: 3
dims: 4
data_type: 1
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
name: "featureHistogram"
, 'labelHistogramExpanded': dims: 4
dims: 3
data_type: 1
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
float_data: 50.0
name: "labelHistogramExpanded"
, 'absentFeaturesLogProb': dims: 3
dims: 1
data_type: 11
name: "absentFeaturesLogProb"
double_data: -15.881167654208488
double_data: -15.881167654208488
double_data: -15.881167654208488
, 'mlnet.Features.SlotNames': dims: 1
dims: 1
data_type: 8
string_data: "one"
name: "mlnet.Features.SlotNames"
}, 'tree_implementation': 'tree_trav', 'max_string_length': 12}.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter implemented.
Please fill an issue at https://github.com/microsoft/hummingbird.

Model attached

iris-classification.zip

luisquintanilla commented 3 years ago

Tried it using a regression sample all with numerical values and got the following:

Code:

import onnx
import numpy as np
from hummingbird.ml import convert, constants

# Define model path
model_path = "real-estate-price.onnx"

# Load ONNX model
onnx_model = onnx.load_model(model_path)

# Define sample input
input = np.array([(1.0,1.0,1.0,1.0,1.0,1.0)])

# Convert to HB-ONNX
hb_onnx = convert(onnx_model,"onnx",test_input=input)

# Save HB-ONNX model
hb_onnx.save("hb-taxi-fare")

Console output:

raceback (most recent call last):
  File "convert.py", line 15, in <module>
    hb_onnx = convert(onnx_model,"onnx",test_input=input)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/convert.py", line 431, in convert
    return _convert_common(model, backend, test_input, device, extra_config)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/convert.py", line 387, in _convert_common
    return _convert_onnxml(model, backend, test_input, device, extra_config)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/convert.py", line 260, in _convert_onnxml
    hb_model = topology_converter(topology, backend, test_input, device, extra_config=extra_config)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/_topology.py", line 185, in convert
    executor = Executor(
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/_executor.py", line 57, in __init__
    self._input_names = _fix_var_naming(operators, input_names)
  File "/anaconda/envs/hb-mlnet-onnx/lib/python3.8/site-packages/hummingbird/ml/_executor.py", line 54, in _fix_var_naming
    new_names.append(map[name])
KeyError: 'Label'

Model attached: real-estate-price.zip

interesaaat commented 3 years ago

@luisquintanilla is it possible to export the models such that only the actual model that needs to be evaluated is contained into the onnx file? I see that this models contains different separate graphs for labels, etc and I think this is what is creating problems.

luisquintanilla commented 3 years ago

@interesaat. I'm not sure, but definitely something I can look into.

interesaaat commented 3 years ago

Hi @luisquintanilla, were you able to look into it? Otherwise I will start with your models and see how I can remove unnecessary operators within Hummingbird.

luisquintanilla commented 3 years ago

Hi @interesaat,

Unfortunately I didn't get a chance. I might be able to tackle this sometime in the next two weeks.

interesaaat commented 3 years ago

Thanks @luisquintanilla. Please keep us posted.