Closed cug-wuyu closed 4 years ago
java.lang.IllegalArgumentException: Field fea_rong_0 has data type string at org.jpmml.converter.PMMLEncoder.toContinuous(PMMLEncoder.java:208) at org.jpmml.converter.CategoricalFeature.toContinuousFeature(CategoricalFeature.java:57)
You're trying to use a categorical/string feature in context which requires continuous/numeric feature.
In the current case, please encode categorical features using the OneHotEncoder
(or similar) transformation (the data flow for string columns should be StringIndexer
-> OneHotEncoder
-> VectorAssembler
).
This is not a bug. In fact, the JPMML-SparkML library helped to reveal an invalid Apache Spark ML pipeline here.
@vruusmann thanks, but in this issue #73 , his pipeline struct (stringindex, vectorassemble, rf) is similar as mine, why he successfully created pmml file while i was failed。
@cug-wuyu The pipeline presented in issue #73 is also invalid - categorical features have NOT been properly prepared there.
Apache Spark ML lets you do stupid things. JPMML-SparkML informs you about most critical mistakes (eg. improper encoding of categorical features), hoping that you'll appreciate it and fix your mistake.
Sure, the exception message could be more informative.
when i save the pipeline which contains StringIndexer (use the method to labelEncode on category type feature)、VectorAssembler and XGBoostRegressor as pmml file, the program print the following error:
java.lang.IllegalArgumentException: Field fea_rong_0 has data type string at org.jpmml.converter.PMMLEncoder.toContinuous(PMMLEncoder.java:208) at org.jpmml.converter.CategoricalFeature.toContinuousFeature(CategoricalFeature.java:57) at org.jpmml.converter.Feature.toContinuousFeature(Feature.java:53) at org.jpmml.sparkml.xgboost.BoosterUtil$1.apply(BoosterUtil.java:69) at org.jpmml.sparkml.xgboost.BoosterUtil$1.apply(BoosterUtil.java:57) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.jpmml.converter.Schema.toTransformedSchema(Schema.java:97) at org.jpmml.sparkml.xgboost.BoosterUtil.encodeBooster(BoosterUtil.java:80) at org.jpmml.sparkml.xgboost.XGBoostRegressionModelConverter.encodeModel(XGBoostRegressionModelConverter.java:40) at org.jpmml.sparkml.xgboost.XGBoostRegressionModelConverter.encodeModel(XGBoostRegressionModelConverter.java:28) at org.jpmml.sparkml.ModelConverter.registerModel(ModelConverter.java:171) at org.jpmml.sparkml.PMMLBuilder.build(PMMLBuilder.java:120) at com.rong360.jianhang.spark.newSpark.RunXGBRegression$.main(RunXGBRegression.scala:47) at com.rong360.jianhang.spark.newSpark.RunXGBRegression.main(RunXGBRegression.scala)