I hated the slow nature of hive on S3 on EMR so I wanted to use this committer.
Unfortunately , I am hitting the issue shown below .
Any idea why?
Channel is closed. Failed to relay message
com.netflix.bdp.s3.AlreadyExistsException: Output path already exists: s3://perfteam/ftlitmustest/tpch_s3_tgt_102/.hive-staging_hive_2017-07-18_05-17-39_761_511309325071892267-1/-ext-10000
at com.netflix.bdp.s3.S3DirectoryOutputCommitter.commitJob(S3DirectoryOutputCommitter.java:72)
at com.infa.s3.committer.FTCustomCommitter.commitJob(FTCustomCommitter.java:33)
at org.apache.spark.sql.hive.SparkHiveWriterContainer.commitJob(hiveWriterContainers.scala:121)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.saveAsHiveFile(InsertIntoHiveTable.scala:160)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:259)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
I hated the slow nature of hive on S3 on EMR so I wanted to use this committer. Unfortunately , I am hitting the issue shown below .
Any idea why?
Channel is closed. Failed to relay message com.netflix.bdp.s3.AlreadyExistsException: Output path already exists: s3://perfteam/ftlitmustest/tpch_s3_tgt_102/.hive-staging_hive_2017-07-18_05-17-39_761_511309325071892267-1/-ext-10000 at com.netflix.bdp.s3.S3DirectoryOutputCommitter.commitJob(S3DirectoryOutputCommitter.java:72) at com.infa.s3.committer.FTCustomCommitter.commitJob(FTCustomCommitter.java:33) at org.apache.spark.sql.hive.SparkHiveWriterContainer.commitJob(hiveWriterContainers.scala:121) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.saveAsHiveFile(InsertIntoHiveTable.scala:160) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:259) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)