Open Keyeoh opened 5 years ago
Hi @Keyeoh , I'm not quite sure about the error "module 'com.microsoft.ml.spark' has no attribute 'LightGBMRegressionModel'" but in regards to this: "I found some references in your docs about using saveNativeModel()" This method just saves the model in the native lightgbm format. It is just a file specifying the tree structure. You can re-load that file in any environment - either the lightgbm package in python (on the Booster), or lightgbm in R or the native C++ api - or even in mmlspark.
In addition to that file you should be able to save and load the LightGBM learner in the same way as any other spark pipeline, and we have tests that cover this. I'm not quite sure why you are getting that error though, it's almost as if mmlspark python bindings are not installed.
Hi @imatiach-msft ,
Thank you for your response. I understand the role of saveNativeModel() now. Very interesting indeed, as I sometimes switch to R in my projects.
However, with respect to the original module/attribute problem, I still think is quite strange, since I am using two very simple scripts, one for saving the model once trained, and the other to load it. Both of them are run inside the same conda environment, both import mmlspark and in both cases the scripts are run using spark-submit and passing the --packages Azure:mmlspark:0.17 argument.
I am wondering if it could have something to do with the fact that what I am trying to save and load is a complete PipelineModel that happens to contain a LightGBMRegressionModel as its last stage. It is just as if the pyspark PipelineModel.load() method did not know how to deal with this mmlspark class. Have you tested this LightGBM-inside-pipeline scenario?
Once I have a trained PipelineModel, I use the following line to save it:
model.write().overwrite().save(args["--output"])
And the following to try to load it again:
model = ml.PipelineModel.load(args["<path_model>"])
Do you think it might be related to the pipeline issue? I am trying to debug to the point just before the model is saved, to see if it is valid and able to predict something, just to discard that the saved file might be corrupted. I'll keep you informed.
Regards, Gus.
Me again,
I have been able to remote debug my training script in order to stop exactly at the point after training and just before saving the model to disk. I wanted to check if the model was trained properly.
The model is ok, able to predict and I could also extract some metrics using an evaluator. At that point, and with a valid model in hand, I could reproduce the error:
model.write().overwrite().save("foomodel")
None
ml.PipelineModel.load("foomodel")
AttributeError: module 'com.microsoft.ml.spark' has no attribute 'LightGBMRegressionModel'
My guess is that something in the PipelineModel.load() method is not able to recognize the mmlspark bindings. Notice that I have executed those statements at the same stopped process.
Regards, Gus
@Keyeoh is this still an issue with v0.18.1? Thanks for your help!
I have the same problem with 0.18.1 but for the LightGBMRankerModel Exact same stacktrace, I am not able to load the model after training and saving it. I'm working on Databricks
is this specifically for loading the pipeline in a different environment from where it was saved? I was able to get very far and reproduce this issue, and it may be related to this spark issue:
https://issues.apache.org/jira/browse/SPARK-20765
Instructions to reproduce: 1.) build mmlspark python library 2.) Run: pyspark --jars /home/ilya/mmlspark/target/scala-2.11/mmlspark_2.11-0.18.1-21-671b6889-20190908-1458-SNAPSHOT.jar --packages com.microsoft.ml.lightgbm:lightgbmlib:2.2.400
3.) Run code below: import pandas as pd import numpy as np import pyspark.ml, pyspark.ml.feature from pyspark import SparkContext from pyspark.sql import SQLContext, SparkSession from pyspark.ml.classification import LogisticRegression from pyspark.ml.regression import LinearRegression from mmlspark.lightgbm.LightGBMClassifier import LightGBMClassifier from pyspark.ml.feature import Tokenizer from mmlspark.train import TrainClassifier from mmlspark.featurize import ValueIndexer
tmp1 = { "col1": [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1], "col2": [2, 3, 4, 5, 1, 3, 3, 4, 0, 2, 3, 4], "col3": [0.50, 0.40, 0.78, 0.12, 0.50, 0.40, 0.78, 0.12, 0.50, 0.40, 0.78, 0.12], "col4": [0.60, 0.50, 0.99, 0.34, 0.60, 0.50, 0.99, 0.34, 0.60, 0.50, 0.99, 0.34] } sqlC = SQLContext(sc) pddf = pd.DataFrame(tmp1) pddf["col1"] = pddf["col1"].astype(np.float64) pddf["col2"] = pddf["col2"].astype(np.int32) data = sqlC.createDataFrame(pddf)
from pyspark.ml.feature import VectorAssembler assembler = VectorAssembler( inputCols=["col2", "col3", "col4"], outputCol="features") data_assembled = assembler.transform(data).select("features", "col1") lgbm = LightGBMClassifier(featuresCol="features", labelCol="col1", objective="binary")
from pyspark.ml import Pipeline, PipelineModel pipeline = Pipeline(stages=[lgbm]) pipeline_model = pipeline.fit(data_assembled) pipeline_model.write().overwrite().save("lgbm-model-1") loaded_model = PipelineModel.load("lgbm-model-1") loaded_model.transform(data_assembled)
4.) In a different shell, start same env: pyspark --jars /home/ilya/mmlspark/target/scala-2.11/mmlspark_2.11-0.18.1-21-671b6889-20190908-1458-SNAPSHOT.jar --packages com.microsoft.ml.lightgbm:lightgbmlib:2.2.400
from pyspark.ml import Pipeline, PipelineModel loaded_model = PipelineModel.load("lgbm-model-1") Traceback (most recent call last): File "
", line 1, in File "/home/ilya/lib/spark/python/pyspark/ml/util.py", line 362, in load return cls.read().load(path) File "/home/ilya/lib/spark/python/pyspark/ml/pipeline.py", line 242, in load return JavaMLReader(self.cls).load(path) File "/home/ilya/lib/spark/python/pyspark/ml/util.py", line 304, in load return self._clazz._from_java(java_obj) File "/home/ilya/lib/spark/python/pyspark/ml/pipeline.py", line 299, in _from_java py_stages = [JavaParams._from_java(s) for s in java_stage.stages()] File "/home/ilya/lib/spark/python/pyspark/ml/pipeline.py", line 299, in py_stages = [JavaParams._from_java(s) for s in java_stage.stages()] File "/home/ilya/lib/spark/python/pyspark/ml/wrapper.py", line 227, in _from_java py_type = get_class(stage_name) File "/home/ilya/lib/spark/python/pyspark/ml/wrapper.py", line 221, in get_class m = import(module) ModuleNotFoundError: No module named 'com.microsoft.ml.spark'
However, as soon as I do any import from mmlspark, for example:
from mmlspark.train import TrainClassifier
loading then works: loaded_model = PipelineModel.load("lgbm-model-1")
It seems that the problem is that our python namespaces are different from scala, and spark can't handle that well.
see this comment specifically:
"PySpark will get the Python calss name from Scala class name by replacing "org.apache.spark" with "pyspark". e.g. Scala calss name is: "org.apache.spark.ml.regression.LinearRegression", then replace "org.apache.spark" with "pyspark" to get python calss name "pyspark.ml.regression.LinearRegression".
So if 3rd party class name in Scala does not contain "org.apache.spark ", say com.abc.xyz.ml.SomeClass", by replacing "org.apache.spark" with "pyspark", the python calss name is still "com.abc.xyz.ml.SomeClass", same as Scala class name.
That is:
Otherwise, we get wrong python class name when load persisted content. "
@Keyeoh please run
import mmlspark.train
before loading so our loading monkeypatch can take effect. In the future this should work with just import mmlspark
but ilya mentioned that this needs a patch to work again
sorry, let me reopen this for now as I think there is more that can be investigated
@Keyeoh please run
import mmlspark.train
before loading so our loading monkeypatch can take effect. In the future this should work with just
import mmlspark
but ilya mentioned that this needs a patch to work again
Sorry I disappeared for a while. I have finally managed to install version 0.18.1 on my machine, but it seems the problem is still there.
I have executed my workflow and generated the corresponding pipeline containing a LightGBMRegressor. But when I try to reload it again, this is what happens:
(ninabrlong) gfernandez@VM-Ubuntu:/mnt/data/gfernandez/ninabrlong_testing$ pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:0.18.1
[...]
Using Python version 3.6.6 (default, Oct 9 2018 12:34:16)
SparkSession available as 'spark'.
>>> import pyspark.ml as ml
>>> import mmlspark.train
>>> foo = ml.PipelineModel.load('profiling/model')
2019-09-26 12:31:27 WARN SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.
2019-09-26 12:31:30 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/util.py", line 311, in load
return cls.read().load(path)
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/pipeline.py", line 244, in load
uid, stages = PipelineSharedReadWrite.load(metadata, self.sc, path)
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/pipeline.py", line 378, in load
stage = DefaultParamsReader.loadParamsInstance(stagePath, sc)
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/util.py", line 535, in loadParamsInstance
py_type = DefaultParamsReader.__get_class(pythonClassName)
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/util.py", line 478, in __get_class
m = getattr(m, comp)
AttributeError: module 'com.microsoft.ml.spark.lightgbm' has no attribute 'LightGBMRegressionModel'
@Keyeoh strange, I have tried that and it seemed to work... does it work for you if you import lightgbm? For example:
from mmlspark.lightgbm import LightGBMClassifier
@Keyeoh strange, I have tried that and it seemed to work... does it work for you if you import lightgbm? For example:
from mmlspark.lightgbm import LightGBMClassifier
You mean in an isolated PySpark shell? Sometimes I am afraid I am getting lost due to my lack of knowledge. I have opened a pyspark shell with the mmlspark 0.18.1 and imported what you said and it seems to work.
(ninabrlong) gfernandez@VM-Ubuntu:/mnt/data/gfernandez/ninabrlong_testing/ninabrlong$ pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:0.18.1
Python 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Ivy Default Cache set to: /home/gfernandez/.ivy2/cache
The jars for the packages stored in: /home/gfernandez/.ivy2/jars
:: loading settings :: url = jar:file:/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.microsoft.ml.spark#mmlspark_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-14e96786-667c-4a98-a5dc-282052e00b50;1.0
confs: [default]
found com.microsoft.ml.spark#mmlspark_2.11;0.18.1 in central
found org.scalactic#scalactic_2.11;3.0.5 in central
found org.scala-lang#scala-reflect;2.11.12 in central
found org.scalatest#scalatest_2.11;3.0.5 in central
found org.scala-lang.modules#scala-xml_2.11;1.0.6 in central
found io.spray#spray-json_2.11;1.3.2 in central
found com.microsoft.cntk#cntk;2.4 in central
found org.openpnp#opencv;3.2.0-1 in central
found com.jcraft#jsch;0.1.54 in central
found org.apache.httpcomponents#httpclient;4.5.6 in central
found org.apache.httpcomponents#httpcore;4.4.10 in central
found commons-logging#commons-logging;1.2 in central
found commons-codec#commons-codec;1.10 in central
found com.microsoft.ml.lightgbm#lightgbmlib;2.2.350 in central
found com.github.vowpalwabbit#vw-jni;8.7.0.2 in central
:: resolution report :: resolve 602ms :: artifacts dl 16ms
:: modules in use:
com.github.vowpalwabbit#vw-jni;8.7.0.2 from central in [default]
com.jcraft#jsch;0.1.54 from central in [default]
com.microsoft.cntk#cntk;2.4 from central in [default]
com.microsoft.ml.lightgbm#lightgbmlib;2.2.350 from central in [default]
com.microsoft.ml.spark#mmlspark_2.11;0.18.1 from central in [default]
commons-codec#commons-codec;1.10 from central in [default]
commons-logging#commons-logging;1.2 from central in [default]
io.spray#spray-json_2.11;1.3.2 from central in [default]
org.apache.httpcomponents#httpclient;4.5.6 from central in [default]
org.apache.httpcomponents#httpcore;4.4.10 from central in [default]
org.openpnp#opencv;3.2.0-1 from central in [default]
org.scala-lang#scala-reflect;2.11.12 from central in [default]
org.scala-lang.modules#scala-xml_2.11;1.0.6 from central in [default]
org.scalactic#scalactic_2.11;3.0.5 from central in [default]
org.scalatest#scalatest_2.11;3.0.5 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 15 | 0 | 0 | 0 || 15 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-14e96786-667c-4a98-a5dc-282052e00b50
confs: [default]
0 artifacts copied, 15 already retrieved (0kB/23ms)
2019-09-26 16:55:20 WARN Utils:66 - Your hostname, VM-Ubuntu resolves to a loopback address: 127.0.0.1; using 10.250.5.125 instead (on interface eth0)
2019-09-26 16:55:20 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2019-09-26 16:55:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.2
/_/
Using Python version 3.6.6 (default, Oct 9 2018 12:34:16)
SparkSession available as 'spark'.
>>> from mmlspark.lightgbm import LightGBMClassifier
>>> LightGBMClassifier
<class 'mmlspark.lightgbm.LightGBMClassifier.LightGBMClassifier'>
@Keyeoh sorry, I meant does the load method work if you first import lightgbm:
foo = ml.PipelineModel.load('profiling/model')
@Keyeoh sorry, I meant does the load method work if you first import lightgbm:
foo = ml.PipelineModel.load('profiling/model')
I am afraid it doesn't:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.2
/_/
Using Python version 3.6.6 (default, Oct 9 2018 12:34:16)
SparkSession available as 'spark'.
>>> import pyspark.ml as ml
>>> from mmlspark.lightgbm import LightGBMClassifier
>>> foo = ml.PipelineModel.load('profiling/model')
2019-09-27 08:25:34 WARN SparkContext:66 - Using an existing SparkContext; some configuration may not take effect.
2019-09-27 08:25:38 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/util.py", line 311, in load
return cls.read().load(path)
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/pipeline.py", line 244, in load
uid, stages = PipelineSharedReadWrite.load(metadata, self.sc, path)
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/pipeline.py", line 378, in load
stage = DefaultParamsReader.loadParamsInstance(stagePath, sc)
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/util.py", line 535, in loadParamsInstance
py_type = DefaultParamsReader.__get_class(pythonClassName)
File "/mnt/data/gfernandez/anaconda3/envs/ninabrlong/lib/python3.6/site-packages/pyspark/ml/util.py", line 478, in __get_class
m = getattr(m, comp)
AttributeError: module 'com.microsoft.ml.spark.lightgbm' has no attribute 'LightGBMRegressionModel'
I am wondering if I am getting the right version of the mmlspark package. the monkey patch you were referring to was included in the 0.18.1, wasn't it?
I am also getting this with com.microsoft.ml.spark:mmlspark_2.11:0.18.1. In one step I train the model and save it, and in another step I load it. The load step fails. This is with Databricks runtime 6.0.x-scala2.11.
First step:
from pyspark.ml.evaluation import RegressionEvaluator
re = RegressionEvaluator(predictionCol="prediction", labelCol="RegLabel", metricName="rmse")
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from mmlspark.lightgbm import LightGBMRegressor
regressor = LightGBMRegressor(numIterations=100,
labelCol="RegLabel",
featuresCol="Features",
useBarrierExecutionMode=False)
pg = ParamGridBuilder()\
.addGrid(regressor.learningRate, [0.15])\
.addGrid(regressor.numLeaves, [1000])\
.build()
cv = CrossValidator(estimator = regressor,
estimatorParamMaps = pg,
evaluator = re,
numFolds = 5)
cv_model = cv.fit(reg_features_df)
cv_model.write().save(reg_model)
Second step:
from pyspark.ml.tuning import CrossValidatorModel
reg_model = CrossValidatorModel.read().load(reg_model_path)
Fails at CrossValidatorModel.read().load() with the following error:
I'm experiencing the same issue. This code gets me past it for the time being.
from pyspark.ml.util import DefaultParamsReader
try:
from unittest import mock
except ImportError:
# For Python 2 you might have to pip install mock
import mock
mangled_name = '_DefaultParamsReader__get_class'
prev_get_clazz = getattr(DefaultParamsReader, mangled_name)
def __get_class(clazz):
try:
return prev_get_clazz(clazz)
except AttributeError as outer:
try:
alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark')
return prev_get_clazz(alt_clazz)
except AttributeError:
raise outer
# replace a private method inside spark to let mmlspark load it's own classes
with mock.patch.object(DefaultParamsReader, mangled_name, __get_class):
# load the model
model = CrossValidatorModel.read().load(reg_model_path)
Here's another version that's slightly more cleaned up & easier to reuse.
First, the reusable part:
from pyspark.ml.util import DefaultParamsReader
try:
from unittest import mock
except ImportError:
# For Python 2 you might have to pip install mock
import mock
class MmlShim(object):
mangled_name = '_DefaultParamsReader__get_class'
prev_get_clazz = getattr(DefaultParamsReader, mangled_name)
@classmethod
def __get_class(cls, clazz):
try:
return cls.prev_get_clazz(clazz)
except AttributeError as outer:
try:
alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark')
return cls.prev_get_clazz(alt_clazz)
except AttributeError:
raise outer
def __enter__(self):
self.mock = mock.patch.object(DefaultParamsReader, self.mangled_name, self.__get_class)
self.mock.__enter__()
return self
def __exit__(self, *exc_info):
self.mock.__exit__(*exc_info)
Then, to use it:
with MmlShim():
model = CrossValidatorModel.read().load(reg_model_path)
I'm experiencing the same issue. This code gets me past it for the time being.
from pyspark.ml.util import DefaultParamsReader mangled_name = '_DefaultParamsReader__get_class' prev_get_clazz = getattr(DefaultParamsReader, mangled_name) def __get_class(clazz): try: return prev_get_clazz(clazz) except AttributeError as outer: try: alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark') return prev_get_clazz(alt_clazz) except AttributeError: raise outer # replace a private method inside spark to let mmlspark load it's own classes with mock.patch.object(DefaultParamsReader, mangled_name, __get_class): # load the model model = CrossValidatorModel.read().load(reg_model_path)
Here's another version that's slightly more cleaned up & easier to reuse.
First, the reusable part:
class MmlShim(object): mangled_name = '_DefaultParamsReader__get_class' prev_get_clazz = getattr(DefaultParamsReader, mangled_name) @classmethod def __get_class(cls, clazz): try: return cls.prev_get_clazz(clazz) except AttributeError as outer: try: alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark') return cls.prev_get_clazz(alt_clazz) except AttributeError: raise outer def __enter__(self): self.mock = mock.patch.object(DefaultParamsReader, self.mangled_name, self.__get_class) self.mock.__enter__() return self def __exit__(self, *exc_info): self.mock.__exit__(*exc_info)
Then, to use it:
with MmlShim(): model = CrossValidatorModel.read().load(reg_model_path)
Hey thanks for providing the workaround code, I'm just wondering what is the mock
object in your example code?
My apologies, I forgot my imports (I've updated the comment to reflect):
from unittest import mock
My apologies, I forgot my imports (I've updated the comment to reflect):
from unittest import mock
Thanks so much! It's helpful.
I have tested the code from @tkellogg and can confirm that my code is working now thanks to his context manager.
Thanks a lot!
I have similar issue as well when loading the lightGBM models using MLflow. It seems by importing lightgbm, it can load properly.
from mmlspark.lightgbm import LightGBMRanker
I think it is the namespace issue.
ModuleNotFoundError: No module named 'mmlspark.lightgbm._LightGBMRegressor'
Developers, can you help us?
i build a pipeline with my etl transformersk, some countvectorizer and also lightgbmregressor, i cannot reload the pipelinemodel with "no module name 'com.microsoft.......'", how can i debug it?
I'm experiencing the same issue. This code gets me past it for the time being.
from pyspark.ml.util import DefaultParamsReader try: from unittest import mock except ImportError: # For Python 2 you might have to pip install mock import mock mangled_name = '_DefaultParamsReader__get_class' prev_get_clazz = getattr(DefaultParamsReader, mangled_name) def __get_class(clazz): try: return prev_get_clazz(clazz) except AttributeError as outer: try: alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark') return prev_get_clazz(alt_clazz) except AttributeError: raise outer # replace a private method inside spark to let mmlspark load it's own classes with mock.patch.object(DefaultParamsReader, mangled_name, __get_class): # load the model model = CrossValidatorModel.read().load(reg_model_path)
Here's another version that's slightly more cleaned up & easier to reuse.
First, the reusable part:
from pyspark.ml.util import DefaultParamsReader try: from unittest import mock except ImportError: # For Python 2 you might have to pip install mock import mock class MmlShim(object): mangled_name = '_DefaultParamsReader__get_class' prev_get_clazz = getattr(DefaultParamsReader, mangled_name) @classmethod def __get_class(cls, clazz): try: return cls.prev_get_clazz(clazz) except AttributeError as outer: try: alt_clazz = clazz.replace('com.microsoft.ml.spark', 'mmlspark') return cls.prev_get_clazz(alt_clazz) except AttributeError: raise outer def __enter__(self): self.mock = mock.patch.object(DefaultParamsReader, self.mangled_name, self.__get_class) self.mock.__enter__() return self def __exit__(self, *exc_info): self.mock.__exit__(*exc_info)
Then, to use it:
with MmlShim(): model = CrossValidatorModel.read().load(reg_model_path)
@tkellogg 's solution worked for me
Hi,
I am trying to port my ML pipeline so I can use LightGBM instead of the PySpark GBT. I have been able to design a Pipeline with a LightGBM as final estimator. Once trained, I save the PipelineModel object to disk succesfully.
Problem is, when I want to load the model again to evaluate it, the following error appears:
I could not find any reference to this error, and I do not have a clue on what it could be happening. Besides, I found some references in your docs about using saveNativeModel(), but do not know how that fits in a whole-pipeline-saving scenario.
I am using mmlspark 0.17 and pyspark 2.3.2 in standalone mode in my local development environment.
I looked into the saved model file and found the following structure:
Any hint or help would be much appreciated.
Regards, Gus.