h2oai / sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
Apache License 2.0
967 stars 360 forks source link

Could not initialize the interpreter & H2OClusterNotReachableException #2281

Closed jayden526 closed 4 years ago

jayden526 commented 4 years ago

Could you please help with few issues, I tried to run a simple h2o GLM model in spark. I tried it in 3 environments:

  1. Scalatest in intellij
  2. Scalatest in Maven
  3. Spark application in cluster.

Codes are:

  import ai.h2o.sparkling.ml.algos.H2OGLM
  import org.apache.spark.h2o.{H2OConf, H2OContext}

  val sc = sparksession
  val h2oContext = H2OContext.getOrCreate(sc)

  import h2oContext.implicits._
  val trainDF = <spark dataframe>
  val h2oModel = new H2OGLM()
    .setModelId("glm")
    .setNfolds(4)
    .setSeed(1)
    .setFeaturesCols("feature_1","feature_2","feature_3","feature_4")
    .setLabelCol("label")
    .fit(trainDF)

  println(h2oModel.getModelDetails())
  val output = h2oModel.transform(trainDF.drop("label")) 

The versions are: Scala -> 2.11.7 spark -> 2.4 hadoop -> 2.7 sparkling-water -> 3.30.0.7-1-2.4 sparkling water packages are imported in maven including: sparkling-water-ml_2.11, sparkling-water-core_2.11, sparkling-water-examples_2.11.

Code is able to run in IntelliJ scala unit test, while failed in maven unit test even with all same config, error in maven unit test is:

- GLM *** FAILED ***
java.lang.RuntimeException: Could not initialize the interpreter
at ai.h2o.sparkling.repl.BaseH2OInterpreter.initializeInterpreter(BaseH2OInterpreter.scala:133)
at ai.h2o.sparkling.repl.BaseH2OInterpreter.<init>(BaseH2OInterpreter.scala:265)
at ai.h2o.sparkling.repl.H2OInterpreter.<init>(H2OInterpreter.scala:41)
at ai.h2o.sparkling.backend.api.scalainterpreter.ScalaInterpreterServlet.ai$h2o$sparkling$backend$api$scalainterpreter$ScalaInterpreterServlet$$createInterpreterInPool(ScalaInterpreterServlet.scala:101)
at ai.h2o.sparkling.backend.api.scalainterpreter.ScalaInterpreterServlet$$anonfun$initializeInterpreterPool$1.apply(ScalaInterpreterServlet.scala:95)
at ai.h2o.sparkling.backend.api.scalainterpreter.ScalaInterpreterServlet$$anonfun$initializeInterpreterPool$1.apply(ScalaInterpreterServlet.scala:94)
at scala.collection.immutable.Range.foreach(Range.scala:166)
at ai.h2o.sparkling.backend.api.scalainterpreter.ScalaInterpreterServlet.initializeInterpreterPool(ScalaInterpreterServlet.scala:94)
at ai.h2o.sparkling.backend.api.scalainterpreter.ScalaInterpreterServlet.<init>(ScalaInterpreterServlet.scala:48)
at ai.h2o.sparkling.backend.api.scalainterpreter.ScalaInterpreterServlet$.getServlet(ScalaInterpreterServlet.scala:145)

Code also failed when submitted to cluster, with error below:

-ai.h2o.sparkling.backend.exceptions.H2OClusterNotReachableException: H2O cluster 10.110.162.38:54321 - sparkling-water-<application_id> is not reachable.
H2OContext has not been created.
at ai.h2o.sparkling.backend.utils.H2OContextExtensions$class.getAndVerifyWorkerNodes(H2OContextExtensions.scala:131)
at org.apache.spark.h2o.H2OContext.getAndVerifyWorkerNodes(H2OContext.scala:66)
at org.apache.spark.h2o.H2OContext.connectToH2OCluster(H2OContext.scala:388)
at org.apache.spark.h2o.H2OContext.<init>(H2OContext.scala:87)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:483)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:521)
at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:536)
at com.myTestPackage.H2OSparlingWater$.main(H2OSparlingWater.scala:30)
at com.myTestPackage.H2OSparlingWater.main(H2OSparlingWater.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
Caused by: ai.h2o.sparkling.backend.exceptions.RestApiCommunicationException: H2O node 10.110.162.38:54321 responded with
Status code: 405 : HTTP method POST is not supported by this URL

Server error:

Error 405 HTTP method POST is not supported by this URL

HTTP ERROR 405

Problem accessing /3/CloudLock. Reason:

    HTTP method POST is not supported by this URL

at ai.h2o.sparkling.backend.utils.RestCommunication$class.checkResponseCode(RestCommunication.scala:273)
at ai.h2o.sparkling.backend.utils.RestApiUtils$.checkResponseCode(RestApiUtils.scala:96)
at ai.h2o.sparkling.backend.utils.RestCommunication$class.readURLContent(RestCommunication.scala:252)
at ai.h2o.sparkling.backend.utils.RestApiUtils$.readURLContent(RestApiUtils.scala:96)
at ai.h2o.sparkling.backend.utils.RestCommunication$class.request(RestCommunication.scala:151)
at ai.h2o.sparkling.backend.utils.RestApiUtils$.request(RestApiUtils.scala:96)
at ai.h2o.sparkling.backend.utils.RestCommunication$class.update(RestCommunication.scala:75)
at ai.h2o.sparkling.backend.utils.RestApiUtils$.update(RestApiUtils.scala:96)
at ai.h2o.sparkling.backend.utils.H2OContextExtensions$class.lockCloud(H2OContextExtensions.scala:184)
at ai.h2o.sparkling.backend.utils.H2OContextExtensions$class.getAndVerifyWorkerNodes(H2OContextExtensions.scala:119)
... 13 more

Could you give me some insights into the two issues? Thanks a lot!

mn-mikke commented 4 years ago

Hi @jayden526, Can you share your pom file that you use for building your app and running tests? Or if you uploaded your project somewhere, I could try to debug locally.

jayden526 commented 4 years ago

Thanks @mn-mikke, below is the pom file I used,

2.11 2.1.1 2.11.7 spark-2.4.3-bin-hadoop2.7 2.4.3 hadoop-2.7.3 2.7.3 hive-2.3.6 2.3.6 3.0.5 3.0.5 1.0.8 3.20.1 2.6.5 2.6.5 15.0 provided provided none 512m false 3.2.11 1.3.0 1.5.2 2.1.3 2.0.3
<repositories>
    <repository>
        <snapshots>
            <enabled>true</enabled>
            <updatePolicy>interval:60</updatePolicy>
            <checksumPolicy>fail</checksumPolicy>
        </snapshots>
        <releases>
            <enabled>true</enabled>
            <updatePolicy>interval:60</updatePolicy>
            <checksumPolicy>fail</checksumPolicy>
        </releases>
        <id>public</id>
        <url></url>
    </repository>
    <repository>
        <id>artima</id>
        <name>Artima Maven Repository</name>
        <url>http://repo.artima.com/releases</url>
    </repository>
</repositories>

<dependencies>
    <dependency>
        <groupId>org.json4s</groupId>
        <artifactId>json4s-native_${scala.sdk.version}</artifactId>
        <version>3.6.7</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.sdk.version}</artifactId>
        <version>${spark.version}</version>
        <scope>${spark.deps.scope}</scope>
    </dependency>
    <dependency>
        <groupId>com.intel.analytics.zoo</groupId>
        <artifactId>analytics-zoo-bigdl_0.7.1-spark_2.2.0</artifactId>
        <version>0.3.0</version>
    </dependency>

    <dependency>
        <groupId>ai.h2o</groupId>
        <artifactId>sparkling-water-ml_2.11</artifactId>
        <version>3.30.0.7-1-2.4</version>
    </dependency>
    <dependency>
        <groupId>ai.h2o</groupId>
        <artifactId>sparkling-water-examples_2.11</artifactId>
        <version>3.30.0.7-1-2.4</version>
    </dependency>
    <dependency>
        <groupId>ai.h2o</groupId>
        <artifactId>sparkling-water-core_2.11</artifactId>
        <version>3.30.0.7-1-2.4</version>
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-compiler</artifactId>
        <version>2.11.7</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-repl_${scala.sdk.version}</artifactId>
        <version>${spark.version}</version>
        <scope>${spark.deps.scope}</scope>
    </dependency>

    <dependency>
        <groupId>com.ning</groupId>
        <artifactId>async-http-client</artifactId>
        <version>1.6.3</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_${scala.sdk.version}</artifactId>
        <version>${spark.version}</version>
        <scope>${spark.deps.scope}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.sdk.version}</artifactId>
        <version>${spark.version}</version>
        <scope>${spark.deps.scope}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_${scala.sdk.version}</artifactId>
        <version>${spark.version}</version>
        <scope>${spark.deps.scope}</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_${scala.sdk.version}</artifactId>
        <version>${spark.version}</version>
        <scope>${spark.deps.scope}</scope>
    </dependency>
    <dependency>
        <groupId>com.google.guava</groupId>
        <artifactId>guava</artifactId>
        <version>${google.guava.version}</version>
    </dependency>
    <dependency>
        <groupId>com.typesafe</groupId>
        <artifactId>config</artifactId>
        <version>${typesafe.config.version}</version>
    </dependency>
    <dependency>
        <groupId>org.scalactic</groupId>
        <artifactId>scalactic_${scala.sdk.version}</artifactId>
        <version>${scalactic.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.scalatest</groupId>
        <artifactId>scalatest_${scala.sdk.version}</artifactId>
        <version>${scalatest.version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
    </dependency>
    <dependency>
        <groupId>joda-time</groupId>
        <artifactId>joda-time</artifactId>
    </dependency>
    <dependency>
        <groupId>net.vclk</groupId>
        <artifactId>vclibjava</artifactId>
        <exclusions>
            <exclusion>
                <groupId>*</groupId>
                <artifactId>*</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>net.jpountz.lz4</groupId>
        <artifactId>lz4</artifactId>
        <version>${lz4.version}</version>
    </dependency>
    <dependency>
        <groupId>commons-codec</groupId>
        <artifactId>commons-codec</artifactId>
    </dependency>
    <dependency>
        <groupId>org.xerial</groupId>
        <artifactId>sqlite-jdbc</artifactId>
        <version>${sqlite.jdbc.version}</version>
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.binary.version}</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-core</artifactId>
        <version>${jackson.core.version}</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>${jackson.core.version}</version>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-annotations</artifactId>
        <version>${jackson.core.version}</version>
    </dependency>
    <dependency>
        <groupId>org.jpmml</groupId>
        <artifactId>jpmml-sparkml</artifactId>
        <version>${jpmml-sparkml.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-api</artifactId>
        <version>2.13.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-core</artifactId>
        <version>2.13.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-scala</artifactId>
        <version>11.0</version>
        <type>pom</type>
    </dependency>
    <dependency>
        <groupId>org.apache.logging.log4j</groupId>
        <artifactId>log4j-api-scala_${scala.sdk.version}</artifactId>
        <version>11.0</version>
    </dependency>
    <dependency>
        <groupId>com.softwaremill.sttp.client</groupId>
        <artifactId>core_${scala.sdk.version}</artifactId>
        <version>${sttp.version}</version>
    </dependency>
</dependencies>

<build>
    <pluginManagement>
        <plugins>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <configuration>
                    <args>
                        <arg>-language:implicitConversions</arg>
                    </args>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <executions>
                    <execution>
                        <goals>
                            <goal>test-jar</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.scalatest</groupId>
                <artifactId>scalatest-maven-plugin</artifactId>
                <configuration>
                    <forkMode>never</forkMode>
                    <parallel>false</parallel>
                    <threadCount>5</threadCount>
                    <skipTests>${scalaSkipTests}</skipTests>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-clean-plugin</artifactId>
                <version>3.1.0</version>
                <configuration>
                    <filesets>
                        <fileset>
                            <directory>${project.basedir}/metastore_db
                            </directory>
                            <followSymlinks>false</followSymlinks>
                        </fileset>
                        <fileset>
                            <directory>${project.basedir}</directory>
                            <!--DO NOT UNCOMMENT BELOW INCLUDES WITHOUT UNCOMMENTING ABOVE LINE -->
                            <includes>
                                <include>**/*.tmp</include>
                                <include>**/*.log</include>
                                <include>**/*.log.*</include>
                            </includes>
                            <followSymlinks>false</followSymlinks>
                        </fileset>
                    </filesets>
                </configuration>
            </plugin>
        </plugins>
    </pluginManagement>
    <plugins>
        <plugin>
            <artifactId>maven-surefire-plugin</artifactId>
            <configuration>
                <skip>true</skip>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-clean-plugin</artifactId>
        </plugin>
    </plugins>
</build>
jayden526 commented 4 years ago

Hi @mn-mikke , any idea of this? Thanks

mn-mikke commented 4 years ago

Hi @jayden526,

  1. IMHO, maven unit tests have a problem with the provided scope of Spark dependencies.
  2. For running SW on a cluster, you need to make sure that SW libraries are on the classpath of every Spark executor. To achieve that you can do one of the below points:
jayden526 commented 4 years ago

@mn-mikke, Thanks a lot! Now I am adding the sparkling water jar when submit, and will see if it works.

jayden526 commented 4 years ago

@mn-mikke , it is able to run on cluster now! However it still fails in unit test by Maven (succeeds in Intellij unit test), an additional message is: [init] error: error while loading Object, Missing dependency 'object scala in compiler mirror', required by /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class) Failed to initialize compiler: object scala in compiler mirror not found. Note that as of 2.8 scala does not assume use of the java classpath. For the old behavior pass -usejavacp to scala, or if using a Settings ** object programmatically, settings.usejavacp.value = true.

Looks like it can't find the right scala dependency, but I never had this issue in any other unit tests in this module, Do you think it is related to dependency of SW lib? Thank you!

mn-mikke commented 4 years ago

Sparkling Water 3.30.0.7-1-2.4 was compiled with Scala 2.11.12. Can you try to upgrade Scala version in your project to 2.11.12 ?

mn-mikke commented 4 years ago

Hi @jayden526, do you still have any problem or can I close this ticket?

jayden526 commented 4 years ago

@mn-mikke, sorry for the late reply, it didn't work for me last time and let me try it again to get an update.

jayden526 commented 4 years ago

Hi @mn-mikke , I tried scala 2.11.12 in maven, and now it gives a new error: Not a version: 9 java.lang.NumberFormatException: Not a version: 9 at scala.util.PropertiesTrait$class.parts$1(Properties.scala:184) at scala.util.PropertiesTrait$class.isJavaAtLeast(Properties.scala:187) at scala.util.Properties$.isJavaAtLeast(Properties.scala:17) at scala.tools.util.PathResolverBase$Calculated$.javaBootClasspath(PathResolver.scala:276) at scala.tools.util.PathResolverBase$Calculated$.basis(PathResolver.scala:283) at scala.tools.util.PathResolverBase$Calculated$.containers$lzycompute(PathResolver.scala:293) at scala.tools.util.PathResolverBase$Calculated$.containers(PathResolver.scala:293) at scala.tools.util.PathResolverBase.containers(PathResolver.scala:309) at scala.tools.util.PathResolver.computeResult(PathResolver.scala:341)

I checked online resources and looks like this issue was resolved after scala 2.11.11. Do you have any idea? Thanks a lot!

jayden526 commented 4 years ago

Hi @mn-mikke , as you said 'Sparkling Water 3.30.0.7-1-2.4 was compiled with Scala 2.11.12', while I don't have issues running with scala 2.11.7 on cluster (except unit test issue in maven), can I just go with 2.11.7? or any SW versions complied with 2.11.7? Thanks

mn-mikke commented 4 years ago

I checked online resources and looks like this issue was resolved after scala 2.11.11. Do you have any idea? Thanks a lot!

I guess that one of your other project dependencies still reference an older version of scala. i would try to analyze via mvn dependency:tree or other tool.

mn-mikke commented 4 years ago

Any SW versions complied with 2.11.7?

Nope. All recent versions for spark 2.1 - 2.4 are compiled with Scala 2.11.12

jayden526 commented 4 years ago

Thanks @mn-mikke, this issue was finally resolved by setting 'spark.ext.h2o.repl.enabled=true' in spark config.

jayden526 commented 4 years ago

Sorry, should be 'spark.ext.h2o.repl.enabled, false' or H2OContext.setReplDisabled()