salesforce / TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
https://transmogrif.ai
BSD 3-Clause "New" or "Revised" License
2.24k stars 393 forks source link

java.lang.RuntimeException in Bootstrap your First Project #408

Open shoafj7917 opened 5 years ago

shoafj7917 commented 5 years ago

Describe the bug After auto generating a project training the data always results in the same Java Runtime Error.

To Reproduce Command as given from the README.txt

./gradlew -q sparkSubmit -Dmain=com.salesforce.app.Titanic -Dargs="--run-type=train --model-location /home/TransmogrifAI/./titanic/build/spark/model 
--read-location Passenger=/home/TransmogrifAI/test-data/PassengerDataAll.csv"

Logs or screenshots

Using properties file: null
Parsed arguments:
  master                  local[*]
  deployMode              client
  executorMemory          2G
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            4G
  driverCores             1
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               com.salesforce.app.Titanic
  primaryResource         file:/home/Desktop/TransmogrifAI/titanic/build/install/titanic/lib/titanic-0.0.1.jar
  name                    titanic:com.salesforce.app.Titanic
  childArgs               [--run-type=train --model-location /home/Desktop/TransmogrifAI/./titanic/build/spark/model --read-location Passenger=/home/Desktop/TransmogrifAI/test-data/PassengerDataAll.csv]
...
...

19/09/13 00:08:10 INFO Titanic$: Parsed config:
{
  "runType" : "Train",
  "defaultParams" : {
    "stageParams" : { },
    "readerParams" : { },
    "customParams" : { },
    "alternateReaderParams" : { }
  },
  "readLocations" : {
    "Passenger" : "/home/Desktop/TransmogrifAI/test-data/PassengerDataAll.csv"
  },
  "modelLocation" : "/home/Desktop/TransmogrifAI/./titanic/build/spark/model"
}

Exception in thread "main" java.lang.RuntimeException: Failed to write out stage 'FeatureGeneratorStage_000000000005'
        at com.salesforce.op.stages.OpPipelineStageWriter.writeToJson(OpPipelineStageWriter.scala:81)
        at com.salesforce.op.OpWorkflowModelWriter$$anonfun$3.apply(OpWorkflowModelWriter.scala:131)
        at com.salesforce.op.OpWorkflowModelWriter$$anonfun$3.apply(OpWorkflowModelWriter.scala:131)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.salesforce.op.OpWorkflowModelWriter.stagesJArray(OpWorkflowModelWriter.scala:131)
        at com.salesforce.op.OpWorkflowModelWriter.stagesJArray(OpWorkflowModelWriter.scala:108)
        at com.salesforce.op.OpWorkflowModelWriter.toJson(OpWorkflowModelWriter.scala:83)
        at com.salesforce.op.OpWorkflowModelWriter.toJsonString(OpWorkflowModelWriter.scala:68)
        at com.salesforce.op.OpWorkflowModelWriter.saveImpl(OpWorkflowModelWriter.scala:58)
        at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103)
        at com.salesforce.op.OpWorkflowModelWriter$.save(OpWorkflowModelWriter.scala:193)
        at com.salesforce.op.OpWorkflowModel.save(OpWorkflowModel.scala:221)
        at com.salesforce.op.OpWorkflowRunner.train(OpWorkflowRunner.scala:165)
        at com.salesforce.op.OpWorkflowRunner.run(OpWorkflowRunner.scala:308)
        at com.salesforce.op.OpAppWithRunner.run(OpApp.scala:211)
        at com.salesforce.op.OpApp.main(OpApp.scala:182)
        at com.salesforce.app.Titanic.main(Titanic.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Argument 'extractFn' [com.salesforce.app.Features$$anonfun$5] cannot be serialized. Make sure com.salesforce.app.Features$$anonfun$5 has either no-args ctor or is an object, and does not have any external dependencies, e.g. use any out of scope variables.
        at com.salesforce.op.stages.OpPipelineStageSerializationFuns$class.serializeArgument(OpPipelineStageReaderWriter.scala:234)
        at com.salesforce.op.stages.DefaultValueReaderWriter.serializeArgument(DefaultValueReaderWriter.scala:48)
        at com.salesforce.op.stages.DefaultValueReaderWriter$$anonfun$write$1.apply(DefaultValueReaderWriter.scala:70)
        at com.salesforce.op.stages.DefaultValueReaderWriter$$anonfun$write$1.apply(DefaultValueReaderWriter.scala:69)
        at scala.util.Try$.apply(Try.scala:192)
        at com.salesforce.op.stages.DefaultValueReaderWriter.write(DefaultValueReaderWriter.scala:69)
        at com.salesforce.op.stages.FeatureGeneratorStageReaderWriter.write(FeatureGeneratorStage.scala:189)
        at com.salesforce.op.stages.FeatureGeneratorStageReaderWriter.write(FeatureGeneratorStage.scala:129)
        at com.salesforce.op.stages.OpPipelineStageWriter.writeToJson(OpPipelineStageWriter.scala:80)
        ... 31 more
Caused by: java.lang.RuntimeException: Failed to create an instance of class 'com.salesforce.app.Features$$anonfun$5'. Class has to either have a no-args ctor or be an object.
        at com.salesforce.op.utils.reflection.ReflectionUtils$.newInstance(ReflectionUtils.scala:106)
        at com.salesforce.op.utils.reflection.ReflectionUtils$.newInstance(ReflectionUtils.scala:87)
        at com.salesforce.op.stages.OpPipelineStageSerializationFuns$class.serializeArgument(OpPipelineStageReaderWriter.scala:231)
        ... 39 more
Caused by: java.lang.NoSuchFieldException: MODULE$
        at java.lang.Class.getField(Class.java:1703)
        at com.salesforce.op.utils.reflection.ReflectionUtils$.newInstance(ReflectionUtils.scala:102)
        ... 41 more

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sparkSubmit'.
     Process 'command '/home/spark-2.3.3-bin-hadoop2.7/bin/spark-submit'' finished with non-zero exit value 1

Additional context I am on Ubuntu 18.04 with: spark-2.3.3-bin-hadoop2.7 java openjdk version "1.8.0_222" TransmogrifAI 0.6.1

Hope for your help, Thanks

tovbinm commented 5 years ago

Thank you for reporting this. This is definitely a bug related to the recent serialization changes we made for our models. We will try to fix it asap.

In the meantime you can try using 0.5.x version.

shoafj7917 commented 5 years ago

Is there a timeline when this will be fixed?

tovbinm commented 5 years ago

This requires changing our code generator templates for feature engineering in CLI so it would generate the concrete feature extractor classes. Perhaps @vpatryshev or @wsuchy can have a look?

vpatryshev commented 5 years ago

Sure can do; can I have more details?

On Thu, Oct 10, 2019 at 11:25 AM Matthew Tovbin notifications@github.com wrote:

This requires changing our code generator templates for feature engineering in CLI so it would generate the concrete feature extractor classes. Perhaps @vpatryshev https://github.com/vpatryshev or @wsuchy https://github.com/wsuchy can have a look?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/salesforce/TransmogrifAI/issues/408?email_source=notifications&email_token=AAB24KYXDOHQCE4CVNYAKHLQN5XR5A5CNFSM4IWMANGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA5JZQA#issuecomment-540712128, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB24KZ7LQF5TPCEWUL77QDQN5XR5ANCNFSM4IWMANGA .

-- Thanks, -Vlad

tovbinm commented 5 years ago

@vpatryshev it's similar to what @gerashegalov done here in this PR - https://github.com/salesforce/TransmogrifAI/pull/406

For example this extractor code with anonymous function:

val rowId = FeatureBuilder.Integral[BostonHouse].extract(_.rowId.toIntegral).asPredictor

Has to be replaced with with concrete class:

val rowId = FeatureBuilder.Integral[BostonHouse].extract(new RowId).asPredictor

object BostonFeatures {
    class IntegralExtract(f: BostonHouse => Int) extends BostonFeatureFunc[Integral] {
        override def apply(v1: BostonHouse): Integral = f(v1).toIntegral
    }
    class RowId extends IntegralExtract(_.rowId)
}
tovbinm commented 4 years ago

@vpatryshev any progress on this one? thanks!

vpatryshev commented 4 years ago

Oh, did not even touch. I keep it on my mind all the time, though.

Thanks, -Vlad

On Thu, Nov 14, 2019 at 11:27 AM Matthew Tovbin notifications@github.com wrote:

@vpatryshev https://github.com/vpatryshev any progress on this one? thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/salesforce/TransmogrifAI/issues/408?email_source=notifications&email_token=AAB24K2IHUL57ZXYYJ4NYA3QTWRCDA5CNFSM4IWMANGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEDABQI#issuecomment-554041537, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB24KZL7YDKVJJTE3E6T7LQTWRCDANCNFSM4IWMANGA .

vpatryshev commented 4 years ago

FYI. Working on it now.

vpatryshev commented 4 years ago

@shoafj7917, can you please check that you can reproduce this with the latest version of TransmogrifAI? What you write is not runnable on the current version, and I don't see how to reproduce this behavior. Which directory do you run it in?

jeesim2 commented 4 years ago

When I tried with version 0.6.1, failed as the same manner with @shoafj7917.

@vpatryshev . latest version of TransmogrifAI mean master branch?

master branch fails with follow error.

short,

 Could not find com.salesforce.transmogrifai:transmogrifai-core_2.11:0.6.2-SNAPSHOT.

full,

15:47:36.118 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * What went wrong:
15:47:36.118 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] Execution failed for task ':compileJava'.
15:47:36.118 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] > Could not resolve all files for configuration ':compileClasspath'.
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]    > Could not find com.salesforce.transmogrifai:transmogrifai-core_2.11:0.6.2-SNAPSHOT.
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]      Searched in the following locations:
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]        - https://jcenter.bintray.com/com/salesforce/transmogrifai/transmogrifai-core_2.11/0.6.2-SNAPSHOT/maven-metadata.xml
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]        - https://jcenter.bintray.com/com/salesforce/transmogrifai/transmogrifai-core_2.11/0.6.2-SNAPSHOT/transmogrifai-core_2.11-0.6.2-SNAPSHOT.pom
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]        - https://jcenter.bintray.com/com/salesforce/transmogrifai/transmogrifai-core_2.11/0.6.2-SNAPSHOT/transmogrifai-core_2.11-0.6.2-SNAPSHOT.jar
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]        - https://repo.maven.apache.org/maven2/com/salesforce/transmogrifai/transmogrifai-core_2.11/0.6.2-SNAPSHOT/maven-metadata.xml
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]        - https://repo.maven.apache.org/maven2/com/salesforce/transmogrifai/transmogrifai-core_2.11/0.6.2-SNAPSHOT/transmogrifai-core_2.11-0.6.2-SNAPSHOT.pom
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]        - https://repo.maven.apache.org/maven2/com/salesforce/transmogrifai/transmogrifai-core_2.11/0.6.2-SNAPSHOT/transmogrifai-core_2.11-0.6.2-SNAPSHOT.jar
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]      Required by:
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]          project :
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] * Try:
15:47:36.119 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] Run with --stacktrace option to get the stack trace.  Run with --scan to get full insights.
15:47:36.120 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter]
tovbinm commented 4 years ago

The master version was not published yet, but you can do it locally.

$ git clone git@github.com:salesforce/TransmogrifAI.git
$ cd TransmogrifAI
$ ./gradlew publishToMavenLocal
jeesim2 commented 4 years ago

Even after install master branch to local maven repository then add mavenLocal() to generated bootstrap project's build.gradle file, Bootstrap Project training still fails as follow,

nojihun-ui-MacBook-Pro:titanic jihun$ ./gradlew sparkSubmit -Dmain=com.salesforce.app.Titanic -Dargs="--run-type=train --model-location=/tmp/titanic-model --read-location Passenger=`pwd`/../test-data/PassengerDataAll.csv"

> Task :sparkSubmit
Using properties file: null
Parsed arguments:
  master                  local[*]
  deployMode              client
  executorMemory          2G
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            4G
  driverCores             1
  driverExtraClassPath    null
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               com.salesforce.app.Titanic
  primaryResource         file:/work_base/git_repo/TransmogrifAI/titanic/build/install/titanic/lib/titanic-0.0.1.jar
  name                    titanic:com.salesforce.app.Titanic
  childArgs               [--run-type=train --model-location=/tmp/titanic-model --read-location Passenger=/work_base/git_repo/TransmogrifAI/titanic/../test-data/PassengerDataAll.csv]
  jars                    ...........
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file null:
  (spark.driver.memory,4G)
  (spark.serializer,org.apache.spark.serializer.KryoSerializer)

Main class:
com.salesforce.app.Titanic
Arguments:
--run-type=train
--model-location=/tmp/titanic-model
--read-location
Passenger=/work_base/git_repo/TransmogrifAI/titanic/../test-data/PassengerDataAll.csv
Spark config:
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.jars.............
20/01/30 06:30:03 INFO Titanic$: Parsed config:
{
  "runType" : "Train",
  "defaultParams" : {
    "stageParams" : { },
    "readerParams" : { },
    "customParams" : { },
    "alternateReaderParams" : { }
  },
  "readLocations" : {
    "Passenger" : "/work_base/git_repo/TransmogrifAI/titanic/../test-data/PassengerDataAll.csv"
  },
  "modelLocation" : "/tmp/titanic-model"
}
Exception in thread "main" java.lang.RuntimeException: Failed to write out stage 'FeatureGeneratorStage_000000000005'
        at com.salesforce.op.stages.OpPipelineStageWriter.writeToJson(OpPipelineStageWriter.scala:81)
        at com.salesforce.op.OpWorkflowModelWriter$$anonfun$3.apply(OpWorkflowModelWriter.scala:131)
        at com.salesforce.op.OpWorkflowModelWriter$$anonfun$3.apply(OpWorkflowModelWriter.scala:131)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at com.salesforce.op.OpWorkflowModelWriter.stagesJArray(OpWorkflowModelWriter.scala:131)
        at com.salesforce.op.OpWorkflowModelWriter.stagesJArray(OpWorkflowModelWriter.scala:108)
        at com.salesforce.op.OpWorkflowModelWriter.toJson(OpWorkflowModelWriter.scala:83)
        at com.salesforce.op.OpWorkflowModelWriter.toJsonString(OpWorkflowModelWriter.scala:68)
        at com.salesforce.op.OpWorkflowModelWriter.saveImpl(OpWorkflowModelWriter.scala:58)
        at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103)
        at com.salesforce.op.OpWorkflowModelWriter$.save(OpWorkflowModelWriter.scala:193)
        at com.salesforce.op.OpWorkflowModel.save(OpWorkflowModel.scala:221)
        at com.salesforce.op.OpWorkflowRunner.train(OpWorkflowRunner.scala:165)
        at com.salesforce.op.OpWorkflowRunner.run(OpWorkflowRunner.scala:308)
        at com.salesforce.op.OpAppWithRunner.run(OpApp.scala:211)
        at com.salesforce.op.OpApp.main(OpApp.scala:182)
        at com.salesforce.app.Titanic.main(Titanic.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Argument 'extractFn' [com.salesforce.app.Features$$anonfun$5] cannot be serialized. Make sure com.salesforce.app.Features$$anonfun$5 has either no-args ctor or is an object, and does not have any external dependencies, e.g. use any out of scope variables.
        at com.salesforce.op.stages.OpPipelineStageSerializationFuns$class.serializeArgument(OpPipelineStageReaderWriter.scala:236)
        at com.salesforce.op.stages.DefaultValueReaderWriter.serializeArgument(DefaultValueReaderWriter.scala:48)
        at com.salesforce.op.stages.DefaultValueReaderWriter$$anonfun$write$1.apply(DefaultValueReaderWriter.scala:70)
        at com.salesforce.op.stages.DefaultValueReaderWriter$$anonfun$write$1.apply(DefaultValueReaderWriter.scala:69)
        at scala.util.Try$.apply(Try.scala:192)
        at com.salesforce.op.stages.DefaultValueReaderWriter.write(DefaultValueReaderWriter.scala:69)
        at com.salesforce.op.stages.FeatureGeneratorStageReaderWriter.write(FeatureGeneratorStage.scala:189)
        at com.salesforce.op.stages.FeatureGeneratorStageReaderWriter.write(FeatureGeneratorStage.scala:129)
        at com.salesforce.op.stages.OpPipelineStageWriter.writeToJson(OpPipelineStageWriter.scala:80)
        ... 31 more
Caused by: java.lang.RuntimeException: Failed to create an instance of class 'com.salesforce.app.Features$$anonfun$5'. Class has to either have a no-args ctor or be an object.
        at com.salesforce.op.utils.reflection.ReflectionUtils$.newInstance(ReflectionUtils.scala:106)
        at com.salesforce.op.utils.reflection.ReflectionUtils$.newInstance(ReflectionUtils.scala:87)
        at com.salesforce.op.stages.OpPipelineStageSerializationFuns$class.serializeArgument(OpPipelineStageReaderWriter.scala:233)
        ... 39 more
Caused by: java.lang.NoSuchFieldException: MODULE$
        at java.lang.Class.getField(Class.java:1703)
        at com.salesforce.op.utils.reflection.ReflectionUtils$.newInstance(ReflectionUtils.scala:102)
        ... 41 more

> Task :sparkSubmit FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':sparkSubmit'.
> Process 'command '/Users/jihun/apps/spark/bin/spark-submit'' finished with non-zero exit value 1

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 8m 0s
7 actionable tasks: 2 executed, 5 up-to-date
nojihun-ui-MacBook-Pro:titanic jihun$
tovbinm commented 3 years ago

In order to fix we would need to modify the template used to generate the project. Until fixed, I would recommend to start with existing examples and create you project manually in a similar fashion.