jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

java.lang.IllegalArgumentException: Expected a fitted pipeline model (class org.apache.spark.ml.PipelineModel), got a pipeline stage (class org.apache.spark.ml.PipelineModel) #95

Closed riedel closed 4 years ago

riedel commented 4 years ago

I get this error although I am able to score the model (thus it should be fitted). At a minimum the message is very strange.

Complete backtrace using sparklyrpmml:

Error: java.lang.IllegalArgumentException: Expected a fitted pipeline model (class org.apache.spark.ml.PipelineModel), got a pipeline stage (class org.apache.spark.ml.PipelineModel)
    at org.jpmml.sparkml.PMMLBuilder.<init>(PMMLBuilder.java:84)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at sparklyr.Invoke.invoke(invoke.scala:166)
    at sparklyr.StreamHandler.handleMethodCall(stream.scala:136)
    at sparklyr.StreamHandler.read(stream.scala:61)
    at sparklyr.BackendHandler$$anonfun$channelRead0$1.apply$mcV$sp(handler.scala:58)
    at scala.util.control.Breaks.breakable(Breaks.scala:38)
    at sparklyr.BackendHandler.channelRead0(handler.scala:38)
    at sparklyr.BackendHandler.channelRead0(handler.scala:14)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:748)
vruusmann commented 4 years ago

At a minimum the message is very strange.

The org.jpmml.sparkml.PMMLBuilder class defines two constructors - the real one, and another fake one, which is supposed to catch invalid PMMLBuilder constructor invocations: https://github.com/jpmml/jpmml-sparkml/blob/1.6.1/src/main/java/org/jpmml/sparkml/PMMLBuilder.java#L78-L85

Your code is invoking this fake constructor, even though you're actually supplying a real PipelineModel object to it.

It's possible to invoke the fake constructor deliberatly by downcasting, but I doubt it's the case here:

PipelineModel pipelineModel = pipeline.fit(...);

PMMLBuilder pmmlBuilder = new PMMLBuilder((PipelineStage)pipelineModel);

Complete backtrace using sparklyr2pmml

What is your Apache Spark version, and Sparklyr2PMML/JPMML-SparkML versions?

Can you share a R code snippet that deals with PMML conversion?

riedel commented 4 years ago

Thanks for the explaination at least now I can make sense of it! I updated to version 1.6.1 (since I realized that sparklyR recently started to support Spark 3.0.0) and I haven't had the error since. (However it was also sporadic before using both 2.3 and 2.4 with 1.4.14 and 1.5.7 respectively). If I can reproduce it reliably I would reopen a ticket.