Use numpy arrays for scores returned by RandomForestClassifier

onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX

Apache License 2.0

554 stars 104 forks source link

Use numpy arrays for scores returned by RandomForestClassifier #202

Closed maxnoe closed 5 years ago

maxnoe commented 5 years ago

Why are the scores returned by output_probability a list of dictioniaries? That is different from the sklearn api and much less efiicient for larger datasets.

prabhat00155 commented 5 years ago

This output schema was decided while designing onnxruntime that was shipped with WINML(https://docs.microsoft.com/en-us/windows/ai/windows-ml/index). One reason for it being this way is that Coreml output is zipmap-like schema, and one of the main goals at that time was to be fully compatible with Coreml. e.g. skl.model -> coreml.model ->onnxml == skl.model->onnxml. At this moment, this behaviour would be difficult to change as it would violate Windows backward compatibility, and the huge test case suite that would need to be updated.

maxnoe commented 5 years ago

How much effort would it be to make this optional? So keep the current behaviour by default, but support the other one?

prabhat00155 commented 5 years ago

Let me check on this and get back, I think there is a way for you to remove the final zipmap node if you want scores similar to scikit.

maxnoe commented 5 years ago

Thanks!

prabhat00155 commented 5 years ago

You may use select_model_inputs_outputs(). Pass the model as parameter along with the list of outputs you want, and it would modify your onnx model. Here is an example:

from onnxmltools import load_model, save_model from skl2onnx.helpers.onnx_helper import select_model_inputs_outputs

model = load_model('model.onnx') save_model(select_model_inputs_outputs(model, ['output_label', 'probabilities']), 'model2.onnx')

maxnoe commented 5 years ago

Awesome, thank you!