scylladb / scylla-migrator

Migrate data extract using Spark to Scylla, normally from Cassandra
Apache License 2.0
54 stars 34 forks source link

Errors attempting more maxmaptasks #134

Closed pdbossman closed 1 month ago

pdbossman commented 2 months ago

I was attempting to up the partitions, and I got it to 5, but I'm getting errors now.

time spark-submit --class com.scylladb.migrator.Migrator --master spark://spark-master:7077 --conf spark.eventLog.enabled=true --conf spark.scylla.config=/home/ubuntu/scylla-migrator/config.yaml --conf spark.driver.memory=64G /home/ubuntu/scylla-migrator/migrator/target/scala-2.11/scylla-migrator-assembly-0.0.1.jar 24/05/01 18:35:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 24/05/01 18:35:37 INFO SparkContext: Running Spark version 2.4.8 24/05/01 18:35:37 INFO SparkContext: Submitted application: scylla-migrator 24/05/01 18:35:37 INFO SecurityManager: Changing view acls to: ubuntu 24/05/01 18:35:37 INFO SecurityManager: Changing modify acls to: ubuntu 24/05/01 18:35:37 INFO SecurityManager: Changing view acls groups to: 24/05/01 18:35:37 INFO SecurityManager: Changing modify acls groups to: 24/05/01 18:35:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set() 24/05/01 18:35:37 INFO Utils: Successfully started service 'sparkDriver' on port 45965. 24/05/01 18:35:37 INFO SparkEnv: Registering MapOutputTracker 24/05/01 18:35:37 INFO SparkEnv: Registering BlockManagerMaster 24/05/01 18:35:37 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 24/05/01 18:35:37 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 24/05/01 18:35:37 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f5569fb3-03d4-42f9-8285-30942974a654 24/05/01 18:35:37 INFO MemoryStore: MemoryStore started with capacity 34.0 GB 24/05/01 18:35:37 INFO SparkEnv: Registering OutputCommitCoordinator 24/05/01 18:35:37 INFO Utils: Successfully started service 'SparkUI' on port 4040. 24/05/01 18:35:37 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-master:4040 24/05/01 18:35:37 INFO SparkContext: Added JAR file:/home/ubuntu/scylla-migrator/migrator/target/scala-2.11/scylla-migrator-assembly-0.0.1.jar at spark://spark-master:45965/jars/scylla-migrator-assembly-0.0.1.jar with timestamp 1714588537635 24/05/01 18:35:37 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077... 24/05/01 18:35:37 INFO TransportClientFactory: Successfully created connection to spark-master/172.31.22.139:7077 after 28 ms (0 ms spent in bootstraps) 24/05/01 18:35:37 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20240501183537-0002 24/05/01 18:35:37 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240501183537-0002/0 on worker-20240501182238-172.31.66.134-35449 (172.31.66.134:35449) with 4 core(s) 24/05/01 18:35:37 INFO StandaloneSchedulerBackend: Granted executor ID app-20240501183537-0002/0 on hostPort 172.31.66.134:35449 with 4 core(s), 1024.0 MB RAM 24/05/01 18:35:37 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20240501183537-0002/1 on worker-20240501182240-172.31.66.134-36103 (172.31.66.134:36103) with 4 core(s) 24/05/01 18:35:37 INFO StandaloneSchedulerBackend: Granted executor ID app-20240501183537-0002/1 on hostPort 172.31.66.134:36103 with 4 core(s), 1024.0 MB RAM 24/05/01 18:35:37 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46193. 24/05/01 18:35:37 INFO NettyBlockTransferService: Server created on spark-master:46193 24/05/01 18:35:37 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 24/05/01 18:35:37 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240501183537-0002/1 is now RUNNING 24/05/01 18:35:37 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20240501183537-0002/0 is now RUNNING 24/05/01 18:35:37 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-master, 46193, None) 24/05/01 18:35:37 INFO BlockManagerMasterEndpoint: Registering block manager spark-master:46193 with 34.0 GB RAM, BlockManagerId(driver, spark-master, 46193, None) 24/05/01 18:35:37 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-master, 46193, None) 24/05/01 18:35:37 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-master, 46193, None) 24/05/01 18:35:38 INFO EventLoggingListener: Logging events to file:/tmp/spark-events/app-20240501183537-0002 24/05/01 18:35:38 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 24/05/01 18:35:38 INFO migrator: Loaded config: MigratorConfig(DynamoDB(None,None,None,sample-table,Some(12000),Some(4096),None,Some(4000)),DynamoDB(Some(DynamoDBEndpoint(http://scylla,8000)),None,Some(AWSCredentials(emp..., )),sample-table,Some(4000),None,None,Some(1),false,None),List(),Savepoints(300,/app/savepoints),Set(),Validation(false,60000,1000,100,0.001,0)) 24/05/01 18:35:39 WARN ApacheUtils: NoSuchMethodException was thrown when disabling normalizeUri. This indicates you are using an old version (< 4.5.8) of Apache http client. It is recommended to use http client version >= 4.5.9 to avoid the breaking change introduced in apache client 4.5.7 and the latency in exception handling. See https://github.com/aws/aws-sdk-java/issues/1919 for more information 24/05/01 18:35:39 WARN ClusterTopologyNodeCapacityProvider: Exception when trying to determine instance types java.nio.file.NoSuchFileException: /mnt/var/lib/info/job-flow.json at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at java.nio.file.Files.readAllBytes(Files.java:3152) at org.apache.hadoop.dynamodb.util.ClusterTopologyNodeCapacityProvider.readJobFlowJsonString(ClusterTopologyNodeCapacityProvider.java:102) at org.apache.hadoop.dynamodb.util.ClusterTopologyNodeCapacityProvider.getCoreNodeMemoryMB(ClusterTopologyNodeCapacityProvider.java:41) at org.apache.hadoop.dynamodb.util.TaskCalculator.getMaxMapTasks(TaskCalculator.java:53) at org.apache.hadoop.dynamodb.DynamoDBUtil.calcMaxMapTasks(DynamoDBUtil.java:271) at org.apache.hadoop.dynamodb.read.AbstractDynamoDBInputFormat.getSplits(AbstractDynamoDBInputFormat.java:46) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:273) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:269) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:269) at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:289) at com.scylladb.migrator.alternator.AlternatorMigrator$.migrate(AlternatorMigrator.scala:22) at com.scylladb.migrator.Migrator$.main(Migrator.scala:43) at com.scylladb.migrator.Migrator.main(Migrator.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 24/05/01 18:35:39 INFO alternator: We need to transfer: 5 partitions in total 24/05/01 18:35:39 INFO alternator: Starting write... 24/05/01 18:35:39 INFO DynamoUtils: Checking for table existence at destination 24/05/01 18:35:39 INFO DynamoUtils: Table sample-table exists at destination 24/05/01 18:35:40 WARN FileOutputCommitter: Output Path is null in setupJob() 24/05/01 18:40:37 WARN HeartbeatReceiver: Removing executor 1 with no recent heartbeats: 129987 ms exceeds timeout 120000 ms 24/05/01 18:40:37 ERROR TaskSchedulerImpl: Lost executor 1 on 172.31.66.134: Executor heartbeat timed out after 129987 ms 24/05/01 18:40:37 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.31.66.134, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 129987 ms 24/05/01 18:40:37 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 172.31.66.134, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 129987 ms 24/05/01 18:40:47 ERROR TaskSchedulerImpl: Lost executor 1 on 172.31.66.134: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 24/05/01 18:40:47 WARN TaskSetManager: Lost task 3.1 in stage 0.0 (TID 5, 172.31.66.134, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 24/05/01 18:42:37 WARN HeartbeatReceiver: Removing executor 0 with no recent heartbeats: 172112 ms exceeds timeout 120000 ms 24/05/01 18:42:37 ERROR TaskSchedulerImpl: Lost executor 0 on 172.31.66.134: Executor heartbeat timed out after 172112 ms 24/05/01 18:42:37 WARN TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, 172.31.66.134, executor 0): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 172112 ms 24/05/01 18:42:37 WARN TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4, 172.31.66.134, executor 0): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 172112 ms 24/05/01 18:42:37 WARN TaskSetManager: Lost task 1.1 in stage 0.0 (TID 6, 172.31.66.134, executor 0): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 172112 ms 24/05/01 18:42:37 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.31.66.134, executor 0): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 172112 ms 24/05/01 18:42:47 ERROR TaskSchedulerImpl: Lost executor 0 on 172.31.66.134: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 24/05/01 18:43:56 ERROR TaskSchedulerImpl: Lost executor 2 on 172.31.66.134: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 24/05/01 18:43:56 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 8, 172.31.66.134, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 24/05/01 18:43:56 WARN TaskSetManager: Lost task 3.2 in stage 0.0 (TID 7, 172.31.66.134, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 24/05/01 18:43:56 WARN TaskSetManager: Lost task 4.1 in stage 0.0 (TID 10, 172.31.66.134, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 24/05/01 18:43:56 WARN TaskSetManager: Lost task 1.2 in stage 0.0 (TID 9, 172.31.66.134, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. ^C24/05/01 18:47:43 ERROR SparkHadoopWriter: Aborting job job_20240501183540_0001. org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:954) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:952) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:952) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:2164) at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2077) at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1949) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340) at org.apache.spark.SparkContext.stop(SparkContext.scala:1948) at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:575) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:759) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2088) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:385) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094) at com.scylladb.migrator.writers.DynamoDB$.writeRDD(DynamoDB.scala:44) at com.scylladb.migrator.alternator.AlternatorMigrator$.migrate(AlternatorMigrator.scala:43) at com.scylladb.migrator.Migrator$.main(Migrator.scala:43) at com.scylladb.migrator.Migrator.main(Migrator.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 24/05/01 18:47:43 WARN FileOutputCommitter: Output Path is null in cleanupJob() 24/05/01 18:47:43 ERROR alternator: Caught error while writing the RDD. org.apache.spark.SparkException: Job aborted. at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:100) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:385) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1094) at com.scylladb.migrator.writers.DynamoDB$.writeRDD(DynamoDB.scala:44) at com.scylladb.migrator.alternator.AlternatorMigrator$.migrate(AlternatorMigrator.scala:43) at com.scylladb.migrator.Migrator$.main(Migrator.scala:43) at com.scylladb.migrator.Migrator.main(Migrator.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:855) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:930) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:939) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.SparkException: Job 0 cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:954) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:952) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:952) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:2164) at org.apache.spark.util.EventLoop.stop(EventLoop.scala:84) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2077) at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1949) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340) at org.apache.spark.SparkContext.stop(SparkContext.scala:1948) at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:575) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:759) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2067) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2088) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78) ... 23 more

real 12m7.760s user 0m9.648s sys 0m0.801s

config.yaml I used on this run: for-julien.yaml.txt

julienrf commented 1 month ago

Fixed by #143