salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 392 forks source link

Model saving and loading behavior changed since #475 #514

Closed koertkuipers closed 3 years ago

koertkuipers commented 4 years ago

Describe the bug model.save and model.load behavior changes since #475 it seems relative paths and hadoop filesystem urls are no longer supported.

To Reproduce on master branch make this change:

$ git diff
diff --git a/core/src/test/scala/com/salesforce/op/OpWorkflowTest.scala b/core/src/test/scala/com/salesforce/op/OpWorkflowTest.scala
index a89fcce2..3287e7a5 100644
--- a/core/src/test/scala/com/salesforce/op/OpWorkflowTest.scala
+++ b/core/src/test/scala/com/salesforce/op/OpWorkflowTest.scala
@@ -547,7 +547,7 @@ class OpWorkflowTest extends FlatSpec with PassengerSparkFixtureTest {
     val model = workflow.train()
     val expectedScoresDF = model.score()
     val expectedScores = expectedScoresDF.select(prediction.name, KeyFieldName).sort(KeyFieldName).collect()
-    model.save(workflowLocation)
+    model.save("testmodel")

     def assertModel(model: OpWorkflowModel): Assertion = {
       val scoresDF = model.setInputDataset(ds, keyFn).score()
@@ -557,10 +557,10 @@ class OpWorkflowTest extends FlatSpec with PassengerSparkFixtureTest {
     }

     withClue("Expected to load and score model with provided workflow: ") {
-      assertModel(model = workflow.loadModel(workflowLocation))
+      assertModel(model = workflow.loadModel("testmodel"))
     }
     withClue("Expected to load and score model without workflow: ") {
-      assertModel(model = OpWorkflowModel.load(workflowLocation))
+      assertModel(model = OpWorkflowModel.load("testmodel"))
     }
   }

now the test fails with a stacktrace that looks like this:

com.salesforce.op.OpWorkflowTest > OpWorkflow should train a model with features of all feature types, save, load and score it FAILED
    java.lang.NullPointerException
        at sun.nio.fs.UnixPath.normalizeAndCheck(UnixPath.java:77)
        at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
        at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
        at ml.combust.bundle.BundleFile$.apply(BundleFile.scala:59)
        at ml.combust.bundle.BundleFile$.apply(BundleFile.scala:40)
        at com.salesforce.op.stages.SparkStageParam$$anonfun$jsonEncode$1.apply(SparkStageParam.scala:93)

similarly if you make this change:

$ git diff
diff --git a/core/src/test/scala/com/salesforce/op/OpWorkflowTest.scala b/core/src/test/scala/com/salesforce/op/OpWorkflowTest.scala
index a89fcce2..cae54fcf 100644
--- a/core/src/test/scala/com/salesforce/op/OpWorkflowTest.scala
+++ b/core/src/test/scala/com/salesforce/op/OpWorkflowTest.scala
@@ -547,7 +547,7 @@ class OpWorkflowTest extends FlatSpec with PassengerSparkFixtureTest {
     val model = workflow.train()
     val expectedScoresDF = model.score()
     val expectedScores = expectedScoresDF.select(prediction.name, KeyFieldName).sort(KeyFieldName).collect()
-    model.save(workflowLocation)
+    model.save("file:///tmp/testmodel")

     def assertModel(model: OpWorkflowModel): Assertion = {
       val scoresDF = model.setInputDataset(ds, keyFn).score()
@@ -557,10 +557,10 @@ class OpWorkflowTest extends FlatSpec with PassengerSparkFixtureTest {
     }

     withClue("Expected to load and score model with provided workflow: ") {
-      assertModel(model = workflow.loadModel(workflowLocation))
+      assertModel(model = workflow.loadModel("file:///tmp/testmodel"))
     }
     withClue("Expected to load and score model without workflow: ") {
-      assertModel(model = OpWorkflowModel.load(workflowLocation))
+      assertModel(model = OpWorkflowModel.load("file:///tmp/testmodel"))
     }
   }

you get the same error.

Expected behavior the changes made above do not cause the test to fail if you check out the last commit before #475 got merged

leahmcguire commented 4 years ago

Please point to the latest release (0.7.0) rather than the snapshot and you will not see this issue. The underlying cause is that we switched to mleap serialization which does not work with the hadoop file system. We are working on a fix to save to a local tmp file and then move to the final location, which will be in place before we cut our next release.