jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

Exception in thread "main" java.lang.IllegalArgumentException: skip #48

Closed robin-su closed 6 years ago

robin-su commented 6 years ago

I am using spark2.1.1 and jpmml 1.2.12 in execution, reporting the following error:

Exception in thread "main" java.lang.IllegalArgumentException: skip
    at org.jpmml.sparkml.feature.StringIndexerModelConverter.encodeFeatures(StringIndexerModelConverter.java:65)
    at org.jpmml.sparkml.FeatureConverter.registerFeatures(FeatureConverter.java:47)
    at org.jpmml.sparkml.PMMLBuilder.build(PMMLBuilder.java:114)
    at com.nubia.train.Ad_ctr_train$.main(Ad_ctr_train.scala:182)
    at com.nubia.train.Ad_ctr_train.main(Ad_ctr_train.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
vruusmann commented 6 years ago

The skip value of StringIndexer#handleInvalid param is not supported, because it doesn't make sense from the model evaluation perspective (the model evaluator should consume a data record, and return a void result?).

The workaround is to specify error as param value, and implement the skip logic manually:

try {
  evaluator.evaluate(arguments);
} catch(InvalidResultException ire){
  // Skip data record
}