projectglow / glow

An open-source toolkit for large-scale genomic analysis
https://projectglow.io
Apache License 2.0
264 stars 111 forks source link

spark.kryoserializer.buffer.max error #426

Closed jatin-sandhuria closed 2 years ago

jatin-sandhuria commented 2 years ago

I am getting below issue while running logistic regression with offset dataframe. Any recommendations on how much spark.kryoserializer.buffer.max value we would need to run logistic regressions ? Do we need to set that at cluster level or within notebook using spark config ?

/databricks/spark/python/pyspark/sql/pandas/conversion.py:145: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached the error below and can not continue. Note that 'spark.sql.execution.arrow.pyspark.fallback.enabled' does not have an effect on failures in the middle of computation. An error occurred while calling o21709.getResult. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:428) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:107) at org.apache.spark.security.SocketAuthServer.getResult(SocketAuthServer.scala:103) at sun.reflect.GeneratedMethodAccessor614.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2221.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2221.0 (TID 112200) (10.45.234.147 executor 231): org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 67108864. To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:415) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$19(Executor.scala:859) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:859) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:672) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 67108864 at com.esotericsoftware.kryo.io.Output.require(Output.java:167) at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251) at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38) at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:629) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:332) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:411) ... 9 more

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2783) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2730) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2724) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2724) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1260) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1260) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1260) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2991) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2932) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2920) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1033) at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2474) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2457) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2569) at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToPython$6(Dataset.scala:3751) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1605) at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToPython$3(Dataset.scala:3755) at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToPython$3$adapted(Dataset.scala:3721) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3815) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:126) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:269) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:104) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:852) at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:219) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3813) at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToPython$2(Dataset.scala:3721) at org.apache.spark.sql.Dataset.$anonfun$collectAsArrowToPython$2$adapted(Dataset.scala:3720) at org.apache.spark.security.SocketAuthServer$.$anonfun$serveToStream$2(SocketAuthServer.scala:153) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1605) at org.apache.spark.security.SocketAuthServer$.$anonfun$serveToStream$1(SocketAuthServer.scala:155) at org.apache.spark.security.SocketAuthServer$.$anonfun$serveToStream$1$adapted(SocketAuthServer.scala:150) at org.apache.spark.security.SocketFuncServer.handleConnection(SocketAuthServer.scala:124) at org.apache.spark.security.SocketFuncServer.handleConnection(SocketAuthServer.scala:117) at org.apache.spark.security.SocketAuthServer$$anon$1.$anonfun$run$4(SocketAuthServer.scala:70) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.security.SocketAuthServer$$anon$1.run(SocketAuthServer.scala:70) Caused by: org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 67108864. To avoid this, increase spark.kryoserializer.buffer.max value. at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:415) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$19(Executor.scala:859) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:859) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:672) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 67108864 at com.esotericsoftware.kryo.io.Output.require(Output.java:167) at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:251) at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:237) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:49) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:38) at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:629) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:332) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:302) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:411) ... 9 more

warnings.warn(msg)

jatin-sandhuria commented 2 years ago

Fixed it by adding spark.kryoserializer.buffer.max config with value 2047m at the cluster level config.