jpmml / jpmml-sklearn

Java library and command-line application for converting Scikit-Learn pipelines to PMML
GNU Affero General Public License v3.0
531 stars 117 forks source link

Error casting numpy.int64 to java.lang.Number #74

Closed vivekk0903 closed 6 years ago

vivekk0903 commented 6 years ago

Related to this stackoverflow question here: https://stackoverflow.com/q/49913330/3374996

System information:

python - Python 2.7.6
sklearn - 0.19.1
sklearn.externals.joblib - 0.11
sklearn_pandas - 1.6.0
sklearn2pmml - 0.35.0
converter-executable-1.5-SNAPSHOT.jar 

Code to reproduce:

import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.externals import joblib

from sklearn2pmml.pipeline import PMMLPipeline
from sklearn2pmml import sklearn2pmml

iris_data = load_iris()
X, y = iris_data.data, iris_data.target

knn = KNeighborsClassifier(n_neighbors=np.int64(5))
knn.fit(X, y)

pipeline = PMMLPipeline([("knn",knn)])
pipeline.active_fields = np.array(load_iris().feature_names)

joblib.dump(pipeline, 'pipeline.pkl.z', compress=9)
sklearn2pmml(pipeline, "KNNFit_py.pmml", debug = 'True') 

Expectation: PMML file is saved successfully without any error

Actual Output:

Error 
RuntimeError: The JPMML-SkLearn conversion application has failed. 
The Java executable should have printed more information about the failure into 
its standard output and/or standard error streams

When running the java version:

Apr 19, 2018 12:30:22 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.ClassCastException: numpy.core.Scalar cannot be cast to java.lang.Number
    at sklearn.neighbors.KNeighborsClassifier.getNumberOfNeighbors(KNeighborsClassifier.java:70)
    at sklearn.neighbors.KNeighborsUtil.encodeNeighbors(KNeighborsUtil.java:130)
    at sklearn.neighbors.KNeighborsClassifier.encodeModel(KNeighborsClassifier.java:57)
    at sklearn.neighbors.KNeighborsClassifier.encodeModel(KNeighborsClassifier.java:32)
    at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:161)
    at org.jpmml.sklearn.Main.run(Main.java:145)
    at org.jpmml.sklearn.Main.main(Main.java:94)

Workaround: Replace the np.int64 value with simple int in construction

knn = KNeighborsClassifier(n_neighbors=5)

Issue: I think that jpmml-sklearn should handle numpy scalars to convert them to appropriate java number type based on its dtype (for simple cases like int32, int64 at least). Because in examples like the one in the Stackoverflow question, its usual to use numpy for getting ranges, intervals etc to search, which will throw this error.

Looks like this is the inverse case of [issue discussed here] (https://github.com/jpmml/jpmml-sklearn/issues/61)

vruusmann commented 6 years ago

Fixed in https://github.com/jpmml/jpmml-sklearn/commit/9192e0ca4e083442502ab103ea6a41c11a7de86f

vivekk0903 commented 6 years ago

@vruusmann

Hi, looking at here, I have compiled a list of supported estimators and transformers which don't work when the following parameters are wrapped with a numpy scalar:

  1. GradientBoostingClassifier, GradientBoostingRegressor -> 'learning_rate'
  2. KNeighborsClassifier, KNeighborsRegressor -> 'p'
  3. SVC, NuSVC, SVR, NuSVR, OneClassSVM -> 'degree', 'coef0'
  4. PolynomialFeatures -> 'degree'
  5. Binarizer -> 'threshold'