Closed gespriella closed 3 years ago
The converter does not support string labels in that case. Adding y = y.astype(np.int64)
before training fiwes your issue.
Thank you, it works fine when casting to integers. I also noticed there's a slight difference between the probabilities returned by the same KNN model produced as ONNX and Pickle. Just wanted to ask if you know why this might be happening. I'm giving a short presentation on using models trained in PY on C# with ONNX for my university and want to make sure I can explain this difference.
Tested RandomTrees, ExtraTrees and Logistic, and they all produce the same results, but KNN doesn't:
I'm using KNeighborsClassifier(n_neighbors=6, weights="distance").fit(X,y) for training in python. onx = convert_sklearn(MnistModelKNN, initial_types=[('input', FloatTensorType([1, X.shape[1]]))]) for saving to ONNX. And the following to predict on C#:
var tensor = new DenseTensor<float>(floatArray, inferenceSession.InputMetadata["input"].Dimensions);
var results = inferenceSession.Run(new List<NamedOnnxValue> { NamedOnnxValue.CreateFromTensor("input", tensor) }).ToArray();
You could use double instead of float. The onnx graph for KNNClassifier (http://www.xavierdupre.fr/app/mlprodict/helpsphinx/skl_converters/visual-neighbors-001.html) includes a node Reciprocal and everything related matrix inversion usually increases the probability of having discrepencies. We could probably switch to double in the middle of the pipeline by adding an explicit option. You'll find other details here: http://onnx.ai/sklearn-onnx/auto_tutorial/plot_ebegin_float_double.html.
Thanks very much for your response. I'll test out the MLProdict options at the second link as soon as I can. In the meantime, it seems that using the KNeighborsClassifier() without the hyperparameters that I had explicitly added (n_neighbors=6 and weight='distance'), makes it show no discrepancies. I wonder if the weight='distance' is what's triggering the discrepancy in this case, since it "weight points by the inverse of their distance".
update: Actually it seems very likely that the weight='distance' was to blame, since the RECIPROCAL node disappeared now that I left it as the default (uniform):
Definitively yes! Anytime inverse is used, discrepencies usually appear, mostly because there are null values in almost most cases. The difference in orders of magnitude between small and big values in the inverse matrix are usually higher than the original matrix. I probably should add a section in the above link to study (1 / (float)x) - (float)(1 / x).
May I close the issue?
Yes, thank you!
Hi, I'm getting this error when trying to save my KNN model as onnx and y has string values (that's how it's provided from the fetch_openml call). It works fine when using logisticregression, but fails with KNN on conversion. It does work also if I cast the y values to ints, but my guess is it should probably work without needing to do that.
NotImplementedError: Unable to guess ONNX type from type object. You may raise an issue at https://github.com/onnx/sklearn-onnx/issues.
The code to reproduce is:
from sklearn.datasets import fetch_openml
from sklearn.neighbors import KNeighborsClassifier
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
MnistModel = KNeighborsClassifier(n_neighbors=6, weights="distance")
MnistModel.fit(X, y)
initial_types = [('input', FloatTensorType([1, X.shape[1]]))]
onx = convert_sklearn(MnistModel, initial_types=initial_types)
with open("MnistLR.onnx", "wb") as file:
file.write(onx.SerializeToString())
It fails on convert_sklearn.
Here's the debug info:
NotImplementedError Traceback (most recent call last)