Closed PFloyd0 closed 2 years ago
The Java stack trace indicates that the K-neighbors regressor object inconsistency is detected on line 71 of KNeighborsUtil.java
.
This does not match the latest state of the SkLearn2PMML/JPMML-SkLearn stack (ie. should be line 65): https://github.com/jpmml/jpmml-sklearn/blob/1.7.1/pmml-sklearn/src/main/java/sklearn/neighbors/KNeighborsUtil.java#L65
In other words, please upgrade your SkLearn2PMML package version to the latest (should be 0.78.1), and re-run your experiment.
Hello, I have already updated my sklearn2pmml package from 0.78.0 to 0.78.1 but this problem still occurs. Look forward to your reply. Thank you!
@PFloyd0 Take a look at the Java exception stack trace that you just posted - it still points to line 71.
It means that your SkLearn2PMML package update didn't work.
It means that your SkLearn2PMML package update didn't work.
@PFloyd0 My bad, I'm taking back the above comment.
The SkLearn2PMML package is currently based on JPMML-SkLearn 1.6.X codebase (not the recent 1.7.X codebase), so we use legacy line numbers: https://github.com/jpmml/jpmml-sklearn/blob/1.6.X/src/main/java/sklearn/neighbors/KNeighborsUtil.java#L71
In short, there is a mismatch between KNeighborsRegressor._fit_X
shape (reports 60'000 instances) and KNeightborsRegressor._y
attributes (reports 30'000 instances).
Can you print the values of these two attributes, and see if you see the same mismatch in Python environment.
Second, what are your offline_rss
and offline_location
data matrix types? They don't seem to be Pandas data matrix types (pandas.DataFrame
and pandas.Series
, respectively), because the PMMLPipeline.fit(X, y)
method is unable to extract feature and target names.
Are they raw Numpy arrays?
TLDR: What gets printed to console?
print(knn_reg._fit_X.shape)
print(knn_reg._y.shape) # Alternatively, do `len(knn_reg._y)`
print(offline_rss.shape)
print(offline_rss.__class__)
print(offline_location.shape) # Alternatively, do `len(offline_location)`
print(offline_location.__class__)
Yes, they are raw numpy arrays. I load them from .mat file Should I convert them to dataframe before training?
print(offline_location.shape) (30000, 2)
This is the error - the shape of the y
variable is (30000, 2)
(ie. 60k values), but it should be (30000, 1)
(ie. 30k values).
Is this intentional - are you trying to fit a multi-output KNN regressor model?
The SkLearn2PMML/JPMML-SkLearn stack assumes that regressor models are for a single output column only. Therefore, it sees a Numpy array with 60k elements, and assumes that it's (60000, 1)
.
The converter should be checking the dimensionality of the embedded _y
variable, and raise a targeted exception if the target is not a 1D array-like object.
Ok, thank you very much. The input are four intensity values from bluetooth device and output is location information, so I need two coordinates. I still try to think how to convert the output. Anyway, thank you again for your help.
The input are four intensity values from bluetooth device and output is location information, so I need two coordinates.
Your model is basically a giant look-up table?
You could use a helper object "location" (some unique integer). So, you'd first map from 4D to "location", and then from "location" to 2D.
The latter transformation could be implemented using the PMMLPipeline.predict_transformer
attribute. You'd currently need two look-up tables there - one for the "location -> x" mapping, and another one to "location -> y" mapping.
In principle, the SkLearn2PMML/JPMML-Stack is internally pretty close to multi-output support. It's already well supported on the model evaluation side.
The biggest obstacle right now is that the org.jpmml.converter.Schema
does not support the multi-label use case (there needs to be an org.jpmml.converter.MultiSchema
subclass for that). I have it on a pretty elevated position in my internal TODO list. Quick GitHub search didn't reveal any public issues about it, though.
It works. Thank you very much!
It works.
What works? The suggested work-around using PMMLPipeline.predict_transformer
?
Anyway, KNN models are prime examples of the multi-output use case.
This is already well supported on the JPMML-Evaluator side. What is missing are some lines of code into the JPMML-Converter and JPMML-SkLearn libraries.
Will work on this already this spring. Don't close the issue, otherwise I might forget!
Sorry There are some errors when I convert my model into pmml. It seems the data size has been changed but I do not know why.
Thank you!