Closed PowerToThePeople111 closed 5 years ago
My bad,
it was documented, but I did not see it before: using the PysparkPipelineWrapper.unwrap method helps. ;)
from pyspark.ml import Pipeline
from sparkflow.pipeline_util import PysparkPipelineWrapper
from pyspark.ml import PipelineModel
pipeline = PysparkPipelineWrapper.unwrap(PipelineModel.load("s3://abucket/mnist_model"))
Thanks for filing the issue, as it reminded me that I believe pyspark recently added the capability for custom Transformers to be saved/loaded. That is why this hack had to go in, which I have wanted it to be gone for awhile now.
I am happy, if that helped in some way. :)
Hi guys,
I am using sparkflow 0.7.0 on a Spark 2.4 emr cluster and try to load a pipeline that was created by code that you had in one example script.
After creating the model, I tried to load the pipeline again and transform the mnist_test.csv dataset.
leads to the stacktrace:
When executing alternatively:
Maybe there is something wrong with serialisation and deserialsation?
Best,