Closed mpeychev closed 6 years ago
As a follow up, my PMMLPipeline is quite simple. I am producing the pickle file using the code below:
pmml_pipeline = PMMLPipeline([
("classifier", learning_model)
])
pmml_pipeline.fit(X_train, Y_train)
pmml_pipeline.verify(X_train.sample(n=15))
joblib.dump(pmml_pipeline, "pmml_pipeline.pkl.z", compress = 9)
X_train
and Y_train
are pandas DataFrames.
Are there any other pipeline stages which should be defined and I am missing?
So I found that this issue happens when some of the features have numbers as names. Is this expected behaviour?
java.lang.IllegalArgumentException: Array attribute 'sklearn2pmml.PMMLPipeline.active_fields' contains an unsupported value (Java class java.lang.Integer)
Like the above exception message suggests, the value of PMMLPipeline.active_fields
attribute must be a list of strings:
pipeline = PMMLPipeline(...)
pipeline.active_fields = ["1", "2", "3"] # YES!
If the list contains any non-string elements, then the converter fails:
pipeline.active_fields = [1, 2, 3] # NO-NO-NO!
So I found that this issue happens when some of the features have numbers as names.
I didn't know that pandas.DataFrame
supports non-string column names. If this is "official" behaviour, then I'll improve my code. If this is "unofficial" behaviour, then I'll close this issue as invalid.
In the meantime, simply convert your column names to string.
I had a similar issue, but my training data is a vectorized sparse matrix. i.e. The output of a CountVectorizer. How can I change the column names of such a dataset?
Hi, I am trying to convert a scikit-learn random forest classifier to a pmml file but am obtaining the following exception:
The same issue occurred when trying to use your other tool - sklearn2pmml. Do you have any suggestions what the problem might be?
Thank you!