rdblue / s3committer

Hadoop output committers for S3
Apache License 2.0
108 stars 44 forks source link

failure when attempting to use the committer when writing into a hive table #6

Open ftago opened 7 years ago

ftago commented 7 years ago

I hated the slow nature of hive on S3 on EMR so I wanted to use this committer. Unfortunately , I am hitting the issue shown below .

Any idea why?

Channel is closed. Failed to relay message com.netflix.bdp.s3.AlreadyExistsException: Output path already exists: s3://perfteam/ftlitmustest/tpch_s3_tgt_102/.hive-staging_hive_2017-07-18_05-17-39_761_511309325071892267-1/-ext-10000 at com.netflix.bdp.s3.S3DirectoryOutputCommitter.commitJob(S3DirectoryOutputCommitter.java:72) at com.infa.s3.committer.FTCustomCommitter.commitJob(FTCustomCommitter.java:33) at org.apache.spark.sql.hive.SparkHiveWriterContainer.commitJob(hiveWriterContainers.scala:121) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.saveAsHiveFile(InsertIntoHiveTable.scala:160) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:259) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)