Open torvalds-dev-testbot[bot] opened 10 months ago
Thanks all
Hi there
torvalds.dev is analyzing the ticket
torvalds.dev is analyzing the ticket
torvalds.dev is analyzing the ticket
torvalds.dev is analyzing the ticket
torvalds.dev is analyzing the ticket
torvalds.dev is analyzing the ticket
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced Hi, Iam using Spark to write to hudi table but gets error: 23/12/28 18:51:39 INFO DAGScheduler: Job 8 finished: collectAsMap at HoodieSparkEngineContext.java:164, took 0.089813 s 23/12/28 18:51:39 WARN WriteMarkers: Error deleting marker directory for instant 00000000000000010 org.apache.hudi.exception.HoodieIOException:
<s3a://raw-bucket/bronze/.hoodie/metadata/.hoodie/.temp/00000000000000010>': Directory is not empty at org.apache.hudi.common.fs.FSUtils.deleteDir(FSUtils.java:720) at org.apache.hudi.table.marker.DirectWriteMarkers.deleteMarkerDir(DirectWriteMarkers.java:82) at org.apache.hudi.table.marker.WriteMarkers.quietDeleteMarkerDir(WriteMarkers.java:147) at org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:567) at org.apache.hudi.client.BaseHoodieWriteClient.postWrite(BaseHoodieWriteClient.java:545) at org.apache.hudi.client.SparkRDDWriteClient.bulkInsertPreppedRecords(SparkRDDWriteClient.java:239) at org.apache.hudi.client.SparkRDDWriteClient.bulkInsertPreppedRecords(SparkRDDWriteClient.java:63) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.commitInternal(HoodieBackedTableMetadataWriter.java:1129) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.bulkCommit(SparkHoodieBackedTableMetadataWriter.java:130) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeFromFilesystem(HoodieBackedTableMetadataWriter.java:445) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeIfNeeded(HoodieBackedTableMetadataWriter.java:278) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:182) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:95) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:72) at org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:287) at org.apache.hudi.client.SparkRDDWriteClient.initMetadataTable(SparkRDDWriteClient.java:273) at org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1256) at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1296) at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:139) at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:224) at org.apache.hudi.HoodieSparkSqlWriter$.writeInternal(HoodieSparkSqlWriter.scala:431) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:132) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:150) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:47) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) at org.apache.spark.sql.execution.QueryExecution$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98) at org.apache.spark.sql.execution.QueryExecution$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) at <http://org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org|org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org>$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$super$transformDownWithPruning(LogicalPlan.scala:31) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79) at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:387) at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:360) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239) at com.zdp.core.batch.service.impl.HudiDataManager.insert(HudiDataManager.scala:78) at StorageConnector$.main(StorageConnector.scala:23) at StorageConnector.main(StorageConnector.scala) Caused by: org.apache.hadoop.fs.PathIsNotEmptyDirectoryException:
s3a://raw-bucket/bronze/.hoodie/metadata/.hoodie/.temp/00000000000000010': Directory is not empty Below is my Hudi options and mode is overwrite Map( "hoodie.table.name" -> configurationDTO.tableName, "hoodie.datasource.write.recordkey.field" -> "emp_id", "hoodie.datasource.write.table.name" -> configurationDTO.tableName, //"hoodie.datasource.write.operation" -> "upsert", "hoodie.datasource.write.precombine.field" -> "ts", "hoodie.upsert.shuffle.parallelism" -> "2", "hoodie.insert.shuffle.parallelism" -> "2", RECORDKEY_FIELD.key() -> "emp_id", PARTITIONPATH_FIELD.key() -> "state,department" ) Dependencies "org.apache.spark" %% "spark-core" % "3.2.0", "org.apache.spark" %% "spark-sql" % "3.2.0", "org.apache.spark" %% "spark-streaming" % "3.2.0", "org.apache.hudi" %% "hudi-spark3.2-bundle" % "0.14.0", "org.apache.hadoop" % "hadoop-common" % "3.3.1", "org.apache.hadoop" % "hadoop-client" % "3.3.1", "org.apache.avro" % "avro" % "1.10.2", "org.apache.avro" % "avro-mapred" % "1.10.2" % "test", "org.apache.avro" % "avro-tools" % "1.10.2" % "test", "com.lihaoyi" %% "ujson" % "3.1.2", "org.apache.hadoop" % "hadoop-aws" % "3.3.1", "com.amazonaws" % "aws-java-sdk" % "1.12.622" Iam using Hudi 0.14 A clear and concise description of the problem.To Reproduce
Steps to reproduce the behavior:
1. 2. 3. 4.
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.