Closed wuangKKK closed 4 years ago
Example?
I saved the model like this
@vruusmann
pmml like this @vruusmann
See the source code of the DictVectorizer converter: https://github.com/jpmml/jpmml-sklearn/blob/master/src/main/java/sklearn/feature_extraction/DictVectorizer.java
Your feature is considered to be a numeric, because the DictVectorizer.separator
attribute is not specified.
Looking at the source code of DictVectorizer converter again, then the field type is determined differently for new (this issue) and existing fields (my integration tests). This needs to be unified.
My code:
from sklearn.feature_extraction import DictVectorizer
from sklearn2pmml import sklearn2pmml
from sklearn2pmml.pipeline import PMMLPipeline
from xgboost import XGBClassifier
import pandas
df = pandas.read_csv("Audit.csv")
df_X = df[df.columns.values[0:-1]]
df_X = df_X.to_dict("records")
df_y = df["Adjusted"]
pipeline = PMMLPipeline([
("mapper", DictVectorizer()),
("classifier", XGBClassifier())
])
pipeline.fit(df_X, df_y)
sklearn2pmml(pipeline, "Audit.pmml")
All continuous and categorical features are correctly detected as continuous+double
and categorical+string
, respectively.
Closing as "not reproducible". Whatever the problem, it must be related to your own application code, not the JPMML-SkLearn/SkLearn2PMML stack.