microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.04k stars 828 forks source link

[BUG] Unable to save a trained Isolation Forest model in SynapseML #2094

Open shibuya-phys opened 10 months ago

shibuya-phys commented 10 months ago

SynapseML version

0.10.1

System information

Describe the problem

I'm trying to save an Isolation Forest model after training in SynapseML. However, errors occur, and the save method does not work.

Code to reproduce issue

# building a model
from synapse.ml.isolationforest import *
from pyspark.ml.feature import VectorAssembler

# Isolation Forest parameters
contamination = 0.021
num_estimators = 100
max_samples = 100
max_features = 1.0

# Model Setup
isolationForest = (
    IsolationForest()
    .setNumEstimators(num_estimators)
    .setBootstrap(False)
    .setMaxSamples(max_samples)
    .setMaxFeatures(max_features)
    .setFeaturesCol("features")
    .setPredictionCol("predictedLabel")
    .setScoreCol("outlierScore")
    .setContamination(contamination)
    .setContaminationError(0.01 * contamination)
    .setRandomSeed(1)
)

# Training
va=VectorAssembler(inputCols=inputCols, outputCol="features")
train_data = va.transform(sdf_train)
model_isolationforest_trained = isolationForest.fit(train_data)

# Predictions
test_data = va.transform(sdf_test)
pred = model_isolationforest_trained.transform(test_data)

# Saving
model_isolationforest_trained.write().overwrite().save("path")

Other info / logs

The part of the errors caused from model_isolationforest_trained.write().overwrite().save("path") is like

output Error:
py4JJavaError: An error occured while calling 03919.save: org.apache.spark.SparkException: Job aborted. ~~

Caused by: java.lang.NoSuchMethodError: 'scala.Function1 orgz.apache.spark.sql.execution.datasources.DataSourceUtils$.createDateRebaseFucInWirte(scala.Enumeration$Value, java.lang.String)' ~~

Does the error imply that we cannot save the trained Isolation Forest model in the SynapseML? As a side note, I confirmed that the save method works with the LightGBMClassifier in the SynapseML. I would appreciate it if someone could provide any solutions.

What component(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

github-actions[bot] commented 10 months ago

Hey @shibuya-phys :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.