jpmml / jpmml-evaluator-spark

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
GNU Affero General Public License v3.0
94 stars 43 forks source link

UnmarshalException with Spark 2.0.1 #24

Closed gkiril closed 5 years ago

gkiril commented 5 years ago

I ran into UnmarshalException with Spark 2.0.1 when I was trying to run the readPMML function (had the same issue with 1.6.1, but I upgraded the Spark version)

javax.xml.bind.UnmarshalException: unexpected element (uri:"http://www.dmg.org/PMML-4_3", local:"PMML"). Expected elements are <{http://www.dmg.org/PMML-4_2}ARIMA>,<{http://www.dmg.org/PMML-4_2}Aggregate>,<{http://www.dmg.org/PMML-4_2}Alternate>,<{http://www.dmg.org/PMML-4_2}Annotation>,<{http://www.dmg.org/PMML-4_2}Anova>,<{http://www.dmg.org/PMML-4_2}AnovaRow>,<{http://www.dmg.org/PMML-4_2}AntecedentSequence>,<{http://www.dmg.org/PMML-4_2}AnyDistribution>,<{http://www.dmg.org/PMML-4_2}Application>,<{http://www.dmg.org/PMML-4_2}Apply>,<{http://www.dmg.org/PMML-4_2}Array>,<{http://www.dmg.org/PMML-4_2}AssociationModel>,<{http://www.dmg.org/PMML-4_2}AssociationRule>,<{http://www.dmg.org/PMML-4_2}Attribute>,<{http://www.dmg.org/PMML-4_2}BaseCumHazardTables>,<{http://www.dmg.org/PMML-4_2}Baseline>,<{http://www.dmg.org/PMML-4_2}BaselineCell>,<{http://www.dmg.org/PMML-4_2}BaselineModel>,<{http://www.dmg.org/PMML-4_2}BaselineStratum>,<{http://www.dmg.org/PMML-4_2}BayesInput>,<{http://www.dmg.org/PMML-4_2}BayesInputs>,<{http://www.dmg.org/PMML-4_2}BayesOutput>,<{http://www.dmg.org/PMML-4_2}BoundaryValueMeans>,<{http://www.dmg.org/PMML-4_2}BoundaryValues>,<{http://www.dmg.org/PMML-4_2}CategoricalPredictor>,<{http://www.dmg.org/PMML-4_2}Categories>,<{http://www.dmg.org/PMML-4_2}Category>,<{http://www.dmg.org/PMML-4_2}CenterFields>,<{http://www.dmg.org/PMML-4_2}Characteristic>,<{http://www.dmg.org/PMML-4_2}Characteristics>,<{http://www.dmg.org/PMML-4_2}ChildParent>,<{http://www.dmg.org/PMML-4_2}ClassLabels>,<{http://www.dmg.org/PMML-4_2}Cluster>,<{http://www.dmg.org/PMML-4_2}ClusteringField>,<{http://www.dmg.org/PMML-4_2}ClusteringModel>,<{http://www.dmg.org/PMML-4_2}ClusteringModelQuality>,<{http://www.dmg.org/PMML-4_2}Coefficient>,<{http://www.dmg.org/PMML-4_2}Coefficients>,<{http://www.dmg.org/PMML-4_2}ComparisonMeasure>,<{http://www.dmg.org/PMML-4_2}Comparisons>,<{http://www.dmg.org/PMML-4_2}ComplexPartialScore>,<{http://www.dmg.org/PMML-4_2}CompoundPredicate>,<{http://www.dmg.org/PMML-4_2}CompoundRule>,<{http://www.dmg.org/PMML-4_2}Con>,<{http://www.dmg.org/PMML-4_2}ConfusionMatrix>,<{http://www.dmg.org/PMML-4_2}ConsequentSequence>,<{http://www.dmg.org/PMML-4_2}Constant>,<{http://www.dmg.org/PMML-4_2}Constraints>,<{http://www.dmg.org/PMML-4_2}ContStats>,<{http://www.dmg.org/PMML-4_2}CorrelationFields>,<{http://www.dmg.org/PMML-4_2}CorrelationMethods>,<{http://www.dmg.org/PMML-4_2}CorrelationValues>,<{http://www.dmg.org/PMML-4_2}Correlations>,<{http://www.dmg.org/PMML-4_2}CountTable>,<{http://www.dmg.org/PMML-4_2}Counts>,<{http://www.dmg.org/PMML-4_2}Covariances>,<{http://www.dmg.org/PMML-4_2}CovariateList>,<{http://www.dmg.org/PMML-4_2}DataDictionary>,<{http://www.dmg.org/PMML-4_2}DataField>,<{http://www.dmg.org/PMML-4_2}Decision>,<{http://www.dmg.org/PMML-4_2}DecisionTree>,<{http://www.dmg.org/PMML-4_2}Decisions>,<{http://www.dmg.org/PMML-4_2}DefineFunction>,<{http://www.dmg.org/PMML-4_2}Delimiter>,<{http://www.dmg.org/PMML-4_2}DerivedField>,<{http://www.dmg.org/PMML-4_2}DiscrStats>,<{http://www.dmg.org/PMML-4_2}Discretize>,<{http://www.dmg.org/PMML-4_2}DiscretizeBin>,<{http://www.dmg.org/PMML-4_2}DocumentTermMatrix>,<{http://www.dmg.org/PMML-4_2}EventValues>,<{http://www.dmg.org/PMML-4_2}ExponentialSmoothing>,<{http://www.dmg.org/PMML-4_2}Extension>,<{http://www.dmg.org/PMML-4_2}FactorList>,<{http://www.dmg.org/PMML-4_2}False>,<{http://www.dmg.org/PMML-4_2}FieldColumnPair>,<{http://www.dmg.org/PMML-4_2}FieldRef>,<{http://www.dmg.org/PMML-4_2}FieldValue>,<{http://www.dmg.org/PMML-4_2}FieldValueCount>,<{http://www.dmg.org/PMML-4_2}GaussianDistribution>,<{http://www.dmg.org/PMML-4_2}GeneralRegressionModel>,<{http://www.dmg.org/PMML-4_2}Header>,<{http://www.dmg.org/PMML-4_2}INT-Entries>,<{http://www.dmg.org/PMML-4_2}INT-SparseArray>,<{http://www.dmg.org/PMML-4_2}Indices>,<{http://www.dmg.org/PMML-4_2}InlineTable>,<{http://www.dmg.org/PMML-4_2}InstanceField>,<{http://www.dmg.org/PMML-4_2}InstanceFields>,<{http://www.dmg.org/PMML-4_2}Interval>,<{http://www.dmg.org/PMML-4_2}Item>,<{http://www.dmg.org/PMML-4_2}ItemRef>,<{http://www.dmg.org/PMML-4_2}Itemset>,<{http://www.dmg.org/PMML-4_2}KNNInput>,<{http://www.dmg.org/PMML-4_2}KNNInputs>,<{http://www.dmg.org/PMML-4_2}KohonenMap>,<{http://www.dmg.org/PMML-4_2}Level>,<{http://www.dmg.org/PMML-4_2}LiftData>,<{http://www.dmg.org/PMML-4_2}LiftGraph>,<{http://www.dmg.org/PMML-4_2}LinearNorm>,<{http://www.dmg.org/PMML-4_2}LocalTransformations>,<{http://www.dmg.org/PMML-4_2}MapValues>,<{http://www.dmg.org/PMML-4_2}MatCell>,<{http://www.dmg.org/PMML-4_2}Matrix>,<{http://www.dmg.org/PMML-4_2}MiningBuildTask>,<{http://www.dmg.org/PMML-4_2}MiningField>,<{http://www.dmg.org/PMML-4_2}MiningModel>,<{http://www.dmg.org/PMML-4_2}MiningSchema>,<{http://www.dmg.org/PMML-4_2}MissingValueWeights>,<{http://www.dmg.org/PMML-4_2}ModelExplanation>,<{http://www.dmg.org/PMML-4_2}ModelLiftGraph>,<{http://www.dmg.org/PMML-4_2}ModelStats>,<{http://www.dmg.org/PMML-4_2}ModelVerification>,<{http://www.dmg.org/PMML-4_2}MultivariateStat>,<{http://www.dmg.org/PMML-4_2}MultivariateStats>,<{http://www.dmg.org/PMML-4_2}NaiveBayesModel>,<{http://www.dmg.org/PMML-4_2}NearestNeighborModel>,<{http://www.dmg.org/PMML-4_2}NeuralInput>,<{http://www.dmg.org/PMML-4_2}NeuralInputs>,<{http://www.dmg.org/PMML-4_2}NeuralLayer>,<{http://www.dmg.org/PMML-4_2}NeuralNetwork>,<{http://www.dmg.org/PMML-4_2}NeuralOutput>,<{http://www.dmg.org/PMML-4_2}NeuralOutputs>,<{http://www.dmg.org/PMML-4_2}Neuron>,<{http://www.dmg.org/PMML-4_2}Node>,<{http://www.dmg.org/PMML-4_2}NormContinuous>,<{http://www.dmg.org/PMML-4_2}NormDiscrete>,<{http://www.dmg.org/PMML-4_2}NormalizedCountTable>,<{http://www.dmg.org/PMML-4_2}NumericInfo>,<{http://www.dmg.org/PMML-4_2}NumericPredictor>,<{http://www.dmg.org/PMML-4_2}OptimumLiftGraph>,<{http://www.dmg.org/PMML-4_2}Output>,<{http://www.dmg.org/PMML-4_2}OutputField>,<{http://www.dmg.org/PMML-4_2}PCell>,<{http://www.dmg.org/PMML-4_2}PCovCell>,<{http://www.dmg.org/PMML-4_2}PCovMatrix>,<{http://www.dmg.org/PMML-4_2}PMML>,<{http://www.dmg.org/PMML-4_2}PPCell>,<{http://www.dmg.org/PMML-4_2}PPMatrix>,<{http://www.dmg.org/PMML-4_2}PairCounts>,<{http://www.dmg.org/PMML-4_2}ParamMatrix>,<{http://www.dmg.org/PMML-4_2}Parameter>,<{http://www.dmg.org/PMML-4_2}ParameterField>,<{http://www.dmg.org/PMML-4_2}ParameterList>,<{http://www.dmg.org/PMML-4_2}Partition>,<{http://www.dmg.org/PMML-4_2}PartitionFieldStats>,<{http://www.dmg.org/PMML-4_2}PoissonDistribution>,<{http://www.dmg.org/PMML-4_2}PredictiveModelQuality>,<{http://www.dmg.org/PMML-4_2}Predictor>,<{http://www.dmg.org/PMML-4_2}PredictorTerm>,<{http://www.dmg.org/PMML-4_2}Quantile>,<{http://www.dmg.org/PMML-4_2}REAL-Entries>,<{http://www.dmg.org/PMML-4_2}REAL-SparseArray>,<{http://www.dmg.org/PMML-4_2}ROC>,<{http://www.dmg.org/PMML-4_2}ROCGraph>,<{http://www.dmg.org/PMML-4_2}RandomLiftGraph>,<{http://www.dmg.org/PMML-4_2}Regression>,<{http://www.dmg.org/PMML-4_2}RegressionModel>,<{http://www.dmg.org/PMML-4_2}RegressionTable>,<{http://www.dmg.org/PMML-4_2}ResultField>,<{http://www.dmg.org/PMML-4_2}RuleSelectionMethod>,<{http://www.dmg.org/PMML-4_2}RuleSet>,<{http://www.dmg.org/PMML-4_2}RuleSetModel>,<{http://www.dmg.org/PMML-4_2}ScoreDistribution>,<{http://www.dmg.org/PMML-4_2}Scorecard>,<{http://www.dmg.org/PMML-4_2}SeasonalTrendDecomposition>,<{http://www.dmg.org/PMML-4_2}Seasonality_ExpoSmooth>,<{http://www.dmg.org/PMML-4_2}Segment>,<{http://www.dmg.org/PMML-4_2}Segmentation>,<{http://www.dmg.org/PMML-4_2}Sequence>,<{http://www.dmg.org/PMML-4_2}SequenceModel>,<{http://www.dmg.org/PMML-4_2}SequenceReference>,<{http://www.dmg.org/PMML-4_2}SequenceRule>,<{http://www.dmg.org/PMML-4_2}SetPredicate>,<{http://www.dmg.org/PMML-4_2}SetReference>,<{http://www.dmg.org/PMML-4_2}SimplePredicate>,<{http://www.dmg.org/PMML-4_2}SimpleRule>,<{http://www.dmg.org/PMML-4_2}SimpleSetPredicate>,<{http://www.dmg.org/PMML-4_2}SpectralAnalysis>,<{http://www.dmg.org/PMML-4_2}SupportVector>,<{http://www.dmg.org/PMML-4_2}SupportVectorMachine>,<{http://www.dmg.org/PMML-4_2}SupportVectorMachineModel>,<{http://www.dmg.org/PMML-4_2}SupportVectors>,<{http://www.dmg.org/PMML-4_2}TableLocator>,<{http://www.dmg.org/PMML-4_2}Target>,<{http://www.dmg.org/PMML-4_2}TargetValue>,<{http://www.dmg.org/PMML-4_2}TargetValueCount>,<{http://www.dmg.org/PMML-4_2}TargetValueCounts>,<{http://www.dmg.org/PMML-4_2}TargetValueStat>,<{http://www.dmg.org/PMML-4_2}TargetValueStats>,<{http://www.dmg.org/PMML-4_2}Targets>,<{http://www.dmg.org/PMML-4_2}Taxonomy>,<{http://www.dmg.org/PMML-4_2}TestDistributions>,<{http://www.dmg.org/PMML-4_2}TextCorpus>,<{http://www.dmg.org/PMML-4_2}TextDictionary>,<{http://www.dmg.org/PMML-4_2}TextDocument>,<{http://www.dmg.org/PMML-4_2}TextIndex>,<{http://www.dmg.org/PMML-4_2}TextIndexNormalization>,<{http://www.dmg.org/PMML-4_2}TextModel>,<{http://www.dmg.org/PMML-4_2}TextModelNormalization>,<{http://www.dmg.org/PMML-4_2}TextModelSimiliarity>,<{http://www.dmg.org/PMML-4_2}Time>,<{http://www.dmg.org/PMML-4_2}TimeAnchor>,<{http://www.dmg.org/PMML-4_2}TimeCycle>,<{http://www.dmg.org/PMML-4_2}TimeException>,<{http://www.dmg.org/PMML-4_2}TimeSeries>,<{http://www.dmg.org/PMML-4_2}TimeSeriesModel>,<{http://www.dmg.org/PMML-4_2}TimeValue>,<{http://www.dmg.org/PMML-4_2}Timestamp>,<{http://www.dmg.org/PMML-4_2}TrainingInstances>,<{http://www.dmg.org/PMML-4_2}TransformationDictionary>,<{http://www.dmg.org/PMML-4_2}TreeModel>,<{http://www.dmg.org/PMML-4_2}True>,<{http://www.dmg.org/PMML-4_2}UniformDistribution>,<{http://www.dmg.org/PMML-4_2}UnivariateStats>,<{http://www.dmg.org/PMML-4_2}Value>,<{http://www.dmg.org/PMML-4_2}VectorDictionary>,<{http://www.dmg.org/PMML-4_2}VectorFields>,<{http://www.dmg.org/PMML-4_2}VectorInstance>,<{http://www.dmg.org/PMML-4_2}VerificationField>,<{http://www.dmg.org/PMML-4_2}VerificationFields>,<{http://www.dmg.org/PMML-4_2}XCoordinates>,<{http://www.dmg.org/PMML-4_2}YCoordinates>,<{http://www.dmg.org/PMML-4_2}binarySimilarity>,<{http://www.dmg.org/PMML-4_2}chebychev>,<{http://www.dmg.org/PMML-4_2}cityBlock>,<{http://www.dmg.org/PMML-4_2}euclidean>,<{http://www.dmg.org/PMML-4_2}jaccard>,<{http://www.dmg.org/PMML-4_2}minkowski>,<{http://www.dmg.org/PMML-4_2}row>,<{http://www.dmg.org/PMML-4_2}simpleMatching>,<{http://www.dmg.org/PMML-4_2}squaredEuclidean>,<{http://www.dmg.org/PMML-4_2}tanimoto>
    at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.handleEvent(UnmarshallingContext.java:647)
    at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:258)
    at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:253)
    at com.sun.xml.bind.v2.runtime.unmarshaller.Loader.reportUnexpectedChildElement(Loader.java:120)
    at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext$DefaultRootLoader.childElement(UnmarshallingContext.java:1052)
    at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:483)
    at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:464)
    at com.sun.xml.bind.v2.runtime.unmarshaller.SAXConnector.startElement(SAXConnector.java:152)
    at org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
    at org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
    at org.jpmml.model.filters.PMMLFilter.startElement(PMMLFilter.java:69)
    at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
    at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
    at org.apache.xerces.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
    at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
    at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
    at org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
    at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:216)
    at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:189)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:140)
    at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:123)
    at org.jpmml.model.JAXBUtil.unmarshal(JAXBUtil.java:78)
    at org.jpmml.model.JAXBUtil.unmarshalPMML(JAXBUtil.java:64)
    at org.jpmml.model.PMMLUtil.unmarshal(PMMLUtil.java:31)
    at ConfidencePredictor.readPMML(ConfidencePredictor.java:187)
       ....

In my pom file, I wrote the dependency

<dependency>
    <groupId>org.jpmml</groupId>
    <artifactId>jpmml-evaluator-spark</artifactId>
    <version>1.2.0</version>
</dependency>

which is also used with the spark dependency:

<dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-core_2.11</artifactId>
          <version>2.0.1</version>
</dependency>

I know that Spark has different version of PMML which is in conflict with the newer versions. But, isn't this project for solving that issue too? Is this some sort of an issue or maybe I am doing something wrong?

vruusmann commented 5 years ago

Your Apache Spark application is packaged incorrectly. Specifically, you need to shade all org.dmg.pmml.* and org.jpmml.model.* classes.

Again, this is clearly stated in the README file: https://github.com/jpmml/jpmml-sparkml#library

All this "conflict resolution" documentation is there for a reason.

gkiril commented 5 years ago

Ah, OK, I didn't see the documentation about jpmml-sparkml (I was looking only in this repo -- jpmml-evaluator-spark). Could be useful if this information also stands on the jpmml-evaluator-spark documentation. Thanks for your answer!