oracle / oci-hdfs-connector

HDFS Connector for Oracle Cloud Infrastructure
https://cloud.oracle.com/cloud-infrastructure
Other
27 stars 26 forks source link

BmcDataStore -> renameDirectory fails to rename when a file has special characters(*,$,^,?,+,|) #46

Closed subhanalishasheik closed 3 years ago

subhanalishasheik commented 3 years ago

https://github.com/oracle/oci-hdfs-connector/blob/854920265be7d787a001f66f205a5022da50fe0b/hdfs-connector/src/main/java/com/oracle/bmc/hdfs/store/BmcDataStore.java

We use Spark to store a file to ObjectStore using this connector. This issue occurs when a file/folder has special characters. It is observed for *,$,^,?,+,| but not limited to these.

Analysis:

The renameDirectory() method in BmcDataStore uses the below code to find a new name. String's replaceFIrst method takes the first param as a regex and we need to escape regex characters if it is present in input.

final String newObjectName = objectToRename.replaceFirst(sourceDirectory, destinationDirectory);

Example to reproduce: String objectToRename = "test$11/_temporary/0/_temporary/attempt_20210507141458_0001_m_000000_1/"; String sourceDir= "test$11/_temporary/0/_temporary/attempt_20210507141458_0001_m_000000_1/"; String destDir = "test11/_temporary/0/task_20210507141458_0001_m_000000/"; String newName = objectToRename.replaceFirst(sourceDir,destDir); System.out.println("ObjectToRename:"+objectToRename); System.out.println("newName:"+newName);

Output: ObjectToRename: test$11/_temporary/0/_temporary/attempt_20210507141458_0001_m_000000_1/ newName: test$11/_temporary/0/_temporary/attempt_20210507141458_0001_m_000000_1/

Since both ObjectToRename and newName values are same, ObjectStorage rename API was throwing "Caused by: com.oracle.bmc.model.BmcException: (400, SourceNameSameAsNewName, false) Source name should not be the same as new name (opc-request-id: iad-1:OVKmOCJnZquWhV284uYL-g1DlTA0pnW4XUpjhVTrgB0322Ua_hFGxfVkZcOIqxUK)"

Stacktrace

03:25:24.898 [task-result-getter-0] WARN o.a.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0, 172.18.0.12, executor 0): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Unable to perform rename at com.oracle.bmc.hdfs.store.BmcDataStore.rename(BmcDataStore.java:246) at com.oracle.bmc.hdfs.store.BmcDataStore.renameDirectory(BmcDataStore.java:228) at com.oracle.bmc.hdfs.BmcFilesystem.rename(BmcFilesystem.java:468) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:578) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:549) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:225) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248) ... 10 more Caused by: com.oracle.bmc.model.BmcException: (400, SourceNameSameAsNewName, false) Source name should not be the same as new name (opc-request-id: iad-1:JKNpip3rbb03hw5UpYH_0KJtPiYT5jmsYvgl8r0UUfYkoubuT0673UvDlRML1Rdb) at com.oracle.bmc.http.internal.ResponseHelper.throwIfNotSuccessful(ResponseHelper.java:138) at com.oracle.bmc.http.internal.ResponseConversionFunctionFactory$ValidatingParseResponseFunction.apply(ResponseConversionFunctionFactory.java:88) at com.oracle.bmc.http.internal.ResponseConversionFunctionFactory$ValidatingParseResponseFunction.apply(ResponseConversionFunctionFactory.java:84) at com.oracle.bmc.objectstorage.internal.http.RenameObjectConverter$1.apply(RenameObjectConverter.java:81) at com.oracle.bmc.objectstorage.internal.http.RenameObjectConverter$1.apply(RenameObjectConverter.java:69) at com.oracle.bmc.objectstorage.ObjectStorageClient.lambda$null$86(ObjectStorageClient.java:1745) at com.oracle.bmc.retrier.BmcGenericRetrier.doFunctionCall(BmcGenericRetrier.java:88) at com.oracle.bmc.retrier.BmcGenericRetrier.lambda$execute$0(BmcGenericRetrier.java:59) at com.oracle.bmc.waiter.GenericWaiter.execute(GenericWaiter.java:55) at com.oracle.bmc.retrier.BmcGenericRetrier.execute(BmcGenericRetrier.java:50) at com.oracle.bmc.objectstorage.ObjectStorageClient.lambda$renameObject$87(ObjectStorageClient.java:1737) at com.oracle.bmc.retrier.BmcGenericRetrier.doFunctionCall(BmcGenericRetrier.java:88) at com.oracle.bmc.retrier.BmcGenericRetrier.lambda$execute$0(BmcGenericRetrier.java:59) at com.oracle.bmc.waiter.GenericWaiter.execute(GenericWaiter.java:55) at com.oracle.bmc.retrier.BmcGenericRetrier.execute(BmcGenericRetrier.java:50) at com.oracle.bmc.objectstorage.ObjectStorageClient.renameObject(ObjectStorageClient.java:1731) at com.oracle.bmc.hdfs.store.RenameOperation.call(RenameOperation.java:38) at com.oracle.bmc.hdfs.store.BmcDataStore.rename(BmcDataStore.java:241) ... 22 more[](url)

omkar07 commented 3 years ago

Hi @subhanalishasheik , thanks for reporting this issue! The fix for the above issue is released yesterday. Please use the latest version of HDFS i.e: 3.3.0.7.0.0 Hence, closing this issue for now.

milan-kathrotia commented 2 years ago

Hi @omkar07

We are still facing issue while using $ character. with 3.3.1.0.3.3 version.

jodoglevy commented 2 years ago

@milan-kathrotia we'll take a look, thanks