Closed sasikiran closed 4 years ago
Hi, Currently, internal mode of Sparkling Water does not support auto-scaling. It’s on our long-term roadmap, however for now, please turn off auto-scaling and dynamic allocation.
Thanks,
Kuba [https://www.dropbox.com/s/rzkinjajjoexlcn/signlogo.png?raw=1] Jakub Háva Senior Software Engineer jakub.hava@h2o.aimailto:jakub.hava@h2o.ai H2O.ai, The Open Source Leader in AI and MLhttp://www.h2o.ai/
On May 12, 2019, at 11:55 AM, Sasi Kiran Malladi notifications@github.com<mailto:notifications@github.com> wrote:
I'm trying to use sparkling water on Azure Databricks and I'm not able to create h2o_context. I tried this in both R and Scala on the same cluster.
R Sample code:
install.packages("sparklyr") install.packages("rsparkling") install.packages("h2o", type="source", repos="https://h2o-release.s3.amazonaws.com/h2o/rel-yates/2/R")
library(rsparkling) library(sparklyr)
sc <- spark_connect(method="databricks") h2o_context(sc)
Scala sample code:
import org.apache.spark.h2o._ val hc = H2OContext.getOrCreate(spark)
Configuration details
Error log: Error : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 14.0 failed 4 times, most recent failure: Lost task 1.3 in stage 14.0 (TID 144, 10.139.64.6, executor 18): ExecutorLostFailure (executor 18 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2355) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2343) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2342) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2342) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2574) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2510) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:893) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2233) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2274) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:961) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:379) at org.apache.spark.rdd.RDD.collect(RDD.scala:960) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$class.startH2O(InternalBackendUtils.scala:196) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$.startH2O(InternalBackendUtils.scala:306) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:104) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:129) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:403) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:438) at org.apache.spark.h2o.H2OContext.getOrCreate(H2OContext.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sparklyr.Invoke.invoke(invoke.scala:139) at sparklyr.StreamHandler.handleMethodCall(stream.scala:123) at sparklyr.StreamHandler.read(stream.scala:66) at sparklyr.BackendHandler.channelRead0(handler.scala:51) at sparklyr.BackendHandler.channelRead0(handler.scala:4) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/h2oai/sparkling-water/issues/1193, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABBMAS5CXDH3FUYJZERISOLPU7SQPANCNFSM4HMJ74JQ.
Hi @jakubhava, I created a new Databricks cluster without autoscaling and by setting number of workers to 4. I ran the same code in R and Scala and the same error happened.
Databricks configuration: Databricks runtime: 5.3 ML (includes Apache Spark 2.4.0, Scala 2.11) Enable autoscaling: No Python version: 3 Worker type: Standard_DS3_v2, 14.0 GB Memory, 4 Cores, 0.75 DBU Number of workers: 4 Driver type: Standard_DS3_v2, 14.0 GB Memory, 4 Cores, 0.75 DBU Sparkling Water library: sparkling_water_assembly_2_11_2_4_10_all.jar
Error log:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 5.0 failed 4 times, most recent failure: Lost task 3.3 in stage 5.0 (TID 156, 10.139.64.7, executor 17): ExecutorLostFailure (executor 17 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2355) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2343) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2342) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2342) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2574) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2510) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:893) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2233) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2274) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:961) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:379) at org.apache.spark.rdd.RDD.collect(RDD.scala:960) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$class.startH2O(InternalBackendUtils.scala:196) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$.startH2O(InternalBackendUtils.scala:306) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:104) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:129) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:403) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:419) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:3) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:48) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:50) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw$$iw.<init>(command-3675229254105724:52) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw$$iw.<init>(command-3675229254105724:54) at linea79e6cc413f242d48e96cfbe9622892d25.$read$$iw.<init>(command-3675229254105724:56) at linea79e6cc413f242d48e96cfbe9622892d25.$read.<init>(command-3675229254105724:58) at linea79e6cc413f242d48e96cfbe9622892d25.$read$.<init>(command-3675229254105724:62) at linea79e6cc413f242d48e96cfbe9622892d25.$read$.<clinit>(command-3675229254105724) at linea79e6cc413f242d48e96cfbe9622892d25.$eval$.$print$lzycompute(<notebook>:7) at linea79e6cc413f242d48e96cfbe9622892d25.$eval$.$print(<notebook>:6) at linea79e6cc413f242d48e96cfbe9622892d25.$eval.$print(<notebook>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572) at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:199) at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:190) at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:190) at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:190) at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:590) at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:545) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:190) at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:323) at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:303) at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:235) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:230) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:47) at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:268) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:47) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:303) at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591) at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591) at scala.util.Try$.apply(Try.scala:192) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:586) at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:477) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:544) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:383) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:330) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:216) at java.lang.Thread.run(Thread.java:748)
Can you please share yarn logs?
I am running this on Databricks. There is no yarn integration.
I believe this is not true. Internally, Databricks run their Spark on Hadoop as well.
What we need is full logs from executors and driver
I found these logs under spark driver logs. Are these sufficient? Log4j: http://txt.do/1d0iv StdErr: https://file.io/OPri5A StdOut: https://file.io/G37Zar
Thank you @sasikiran, will have a look
Could you please try the nightly build from this link? We changed some internal configuration which my influence this behavior https://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/nightly/91/index.html
Thank you!
Thanks @jakubhava. I used the nightly build on the same cluster with same code. The error remains same as well. Do you want me to share the logs again?
Oki, thank you! I think it is fine, we will think about what we can do.
Hi @sasikiran, we have been working on a better solution. Can you please give it another try and try this nightly release? This one should do http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/nightly/98/index.html
Hi @jakubhava, thanks for the update. I tried it out using the new nightly build. I am getting a different error this time. I tried it in both R and Scala.
org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.h2o.backends.internal.InternalH2OBackend$$anonfun$startH2OWorkers$1.apply(InternalH2OBackend.scala:161) at org.apache.spark.h2o.backends.internal.InternalH2OBackend$$anonfun$startH2OWorkers$1.apply(InternalH2OBackend.scala:159) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.startH2OWorkers(InternalH2OBackend.scala:159) at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.org$apache$spark$h2o$backends$internal$InternalH2OBackend$$startH2OCluster(InternalH2OBackend.scala:97) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:76) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:129) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:403) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:419) at line5a625531dac549d782510f57d8dab78a25.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:3) at line5a625531dac549d782510f57d8dab78a25.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:48) at line5a625531dac549d782510f57d8dab78a25.$read$$iw$$iw$$iw$$iw.<init>(command-3675229254105724:50) at line5a625531dac549d782510f57d8dab78a25.$read$$iw$$iw$$iw.<init>(command-3675229254105724:52)
Here are the detailed logs: stderr - https://file.io/LFuxcr stdout - https://file.io/zlrMRu log4j - https://file.io/e1hVZU
Thank you this is helpful! This is still fairly new change so we need to iterate.
However, the main error happens on the executor machines. Could you please sent us logs from the executors?
I was able to find these. Please let me know if these are good enough.
Executor 1 log: https://file.io/NZsDMH Executor 2 log: https://file.io/kjlqAT
Thanks, I had a look inside, but did not see useful logs. I will spent some time on it and try to reproduce myself.
I created testing environment with the same configuration in Databricks Azure you have but still can't reproduce, testing using Scala with the simple code bellow:
import org.apache.spark.h2o._
val hc = H2OContext.getOrCreate(spark)
Quick question: Are you specifying any additional spark options?
What the Error above says is that the the driver did not get response from H2O worker node, so the H2O worker node probably didn't start. The logs you sent do not contain information about this behavior.
But you should be able to get the right logs of failing case when you go to Spark UI -> click on executors and send here stdout & stderr from the executors. That would help us a lot
These screenshots should help with getting the right logs
Hi @jakubhava, I think there is some conflict between databricks runtime and sparkling water. I created a new cluster and selected databricks runtime: 5.3 (includes Apache Spark 2.4.0, Scala 2.11)
and ran it. It worked. Then, I went back to my previous cluster which uses 5.3 ML (includes Apache Spark 2.4.0, Scala 2.11)
and ran the same code. It failed with Exception thrown in awaitResult:
As a workaround, I will use sparkling water on databricks runtime 5.3.
Thanks so much!
Hi @sasikiran thanks so much, this information will help us with debugging!
@sasikiran just FYI: I was able to reproduce the issue
Core issue is:
java.lang.IllegalAccessError: tried to access class ml.dmlc.xgboost4j.java.NativeLibLoader from class hex.tree.xgboost.XGBoostExtension
at hex.tree.xgboost.XGBoostExtension.initXgboost(XGBoostExtension.java:68)
at hex.tree.xgboost.XGBoostExtension.isEnabled(XGBoostExtension.java:49)
at water.ExtensionManager.isEnabled(ExtensionManager.java:189)
at water.ExtensionManager.registerCoreExtensions(ExtensionManager.java:103)
at water.H2O.main(H2O.java:2002)
at water.H2OStarter.start(H2OStarter.java:22)
at water.H2OStarter.start(H2OStarter.java:47)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.startH2OWorker(InternalH2OBackend.scala:123)
at org.apache.spark.h2o.backends.internal.H2ORpcEndpoint$$anonfun$receiveAndReply$1.applyOrElse(H2ORpcEndpoint.scala:52)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:105)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:226)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
That’s awesome.
Hi, @jakubhava do you have any workarounds or recommendations on this issue?
Adding @honzasterba as I discussed that with him. Do you please remember the issue with XGBoost conflicts we discussed on Slack?
I think this was caused by there being two xgboost jars on the classpath - one from h2o and one from another library, this caused various issues with xgboost classes being present twice.
Yup, exactly. Do you have any workaround on your mind?
1) figure out where the second xgboost is coming from and if you can somehow prevent it from getting on the classpath 2) if not than we will have fix it on our side (by package-rename or something similar)
I see, thank you.
One more question, which versions of spark and scala where used for sparkling water latest release? It seems, that I've found the issue with xgboost and fixed it, but now I've faced some issues with compiler.
I'm using spark 2.4.3 with scala 2.11.12 on win10 x64 with hadoop 2.7.1 via winutils. Should I create a new issue on this problem?
java.lang.NoClassDefFoundError: scala/tools/nsc/interpreter/InteractiveReader at water.api.scalaInt.ScalaCodeHandler.createInterpreterInPool(ScalaCodeHandler.scala:145) at water.api.scalaInt.ScalaCodeHandler$$anonfun$initializeInterpreterPool$1.apply(ScalaCodeHandler.scala:139) at water.api.scalaInt.ScalaCodeHandler$$anonfun$initializeInterpreterPool$1.apply(ScalaCodeHandler.scala:138) at scala.collection.immutable.Range.foreach(Range.scala:160) at water.api.scalaInt.ScalaCodeHandler.initializeInterpreterPool(ScalaCodeHandler.scala:138) at water.api.scalaInt.ScalaCodeHandler.<init>(ScalaCodeHandler.scala:42) at water.api.scalaInt.ScalaCodeHandler$.registerEndpoints(ScalaCodeHandler.scala:171) at water.api.CoreRestAPI$.registerEndpoints(CoreRestAPI.scala:32) at water.api.RestAPIManager.register(RestAPIManager.scala:39) at water.api.RestAPIManager.registerAll(RestAPIManager.scala:31) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:76) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:128) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:396) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:431) at test.spark.assembly.job.Test$$anonfun$working$1.apply(Test.scala:76) at test.spark.assembly.job.Test$$anonfun$working$1.apply(Test.scala:69) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.ClassNotFoundException: scala.tools.nsc.interpreter.InteractiveReader at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 23 more
Yes, please create a new issue for this one, thanks
@MrHadgehog How did you fix XGBoost issue? I'm having same issue in my project. I've following XGBoost dependency in my project along with H2O Sparkling water dependency.
@jakubhava Is there any way I can avoid this XGboost : java.lang.IllegalAccessError: tried to access class ml.dmlc.xgboost4j.java.NativeLibLoader from class hex.tree.xgboost.XGBoostExtension error?
@MrHadgehog How did you fix XGBoost issue? I'm having same issue in my project. I've following XGBoost dependency in my project along with H2O Sparkling water dependency.
ml.dmlc xgboost4j-spark 0.82 ai.h2o sparkling-water-package_2.11 3.28.0.1-1-2.3
Hi @BhushG, sorry for the late reply. I removed dependencies ml.dmlc.xgboost4j or any similar from the entire project. After that the issue was solved.
@MrHadgehog @jakubhava Hi, I found a turnaround for this problem. Apparently, the sequence of the dependencies matters in maven. So I moved the dependency of h2o above dependency of xgboost4j-spark and it worked.
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>sparkling-water-package_2.11</artifactId>
<version>${h2o.automl.version}</version>
</dependency>
<dependency>
<groupId>ml.dmlc</groupId>
<artifactId>xgboost4j-spark</artifactId>
<version>${xgboost4j.spark.verion}</version>
</dependency>
Thus h2o doesn't throw any error related to XGBoost
hi, how to resolve this xgboost conflict error in azure databricks?
Py4JJavaError: An error occurred while calling o411.getOrCreate.
: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:431)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.$anonfun$startH2OWorkers$1(InternalH2OBackend.scala:183)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend$.org$apache$spark$h2o$backends$internal$InternalH2OBackend$$startH2OWorkers(InternalH2OBackend.scala:181)
at org.apache.spark.h2o.backends.internal.InternalH2OBackend.startH2OCluster(InternalH2OBackend.scala:48)
at org.apache.spark.h2o.H2OContext.
@garimagupta25 Can you use a plain non-ML version of Databricks runtime?
Hello, bumping this issue again. I'm working in a databricks cluster in which I have XGBoost4j libraries installed. I'm trying to create a dummy model using Pysparkling water, finding the same issues described above. I cannot uninstall the xgboost libraries, since they're needed for other apps running on our cluster, and I cannot switch the cluster mode either.
Thus, the solution proposed by @honzasterba is the only option to use Pysparkling for my team in our current setup:
- if not than we will have fix it on our side (by package-rename or something similar)
Is it feasible to do it? I'm currently using pysparkling-water version 3.1 and the issue still remains.
I'm trying to use sparkling water on Azure Databricks and I'm not able to create h2o_context. I tried this in both R and Scala on the same cluster.
R Sample code:
Scala sample code:
Configuration details
Error log:
Error : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 14.0 failed 4 times, most recent failure: Lost task 1.3 in stage 14.0 (TID 144, 10.139.64.6, executor 18): ExecutorLostFailure (executor 18 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:2355) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2343) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:2342) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2342) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:1096) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1096) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2574) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2522) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2510) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:893) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2233) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2255) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2274) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2299) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:961) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:379) at org.apache.spark.rdd.RDD.collect(RDD.scala:960) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$class.startH2O(InternalBackendUtils.scala:196) at org.apache.spark.h2o.backends.internal.InternalBackendUtils$.startH2O(InternalBackendUtils.scala:306) at org.apache.spark.h2o.backends.internal.InternalH2OBackend.init(InternalH2OBackend.scala:104) at org.apache.spark.h2o.H2OContext.init(H2OContext.scala:129) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:403) at org.apache.spark.h2o.H2OContext$.getOrCreate(H2OContext.scala:438) at org.apache.spark.h2o.H2OContext.getOrCreate(H2OContext.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sparklyr.Invoke.invoke(invoke.scala:139) at sparklyr.StreamHandler.handleMethodCall(stream.scala:123) at sparklyr.StreamHandler.read(stream.scala:66) at sparklyr.BackendHandler.channelRead0(handler.scala:51) at sparklyr.BackendHandler.channelRead0(handler.scala:4) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:748)