Open apoorvedave1 opened 4 years ago
It seems FileContext.rename has the same semantics: the atomicity is implementation-dependent. Source: https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileContext.html#rename-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.Options.Rename...-
I assume this is not a serious problem, as HDFS requires atomic rename for any HDFS compatible file systems: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/introduction.html#Atomicity
Describe the issue
Operation Log in Hyperspace relies on 'atomic rename' of log files to support concurrent operations. These operations use
org.apache.hadoop.fs.FileSystem.rename()
api which doen't provide atomicity guarantees as strong asorg.apache.hadoop.fs.FileContext.rename()
Expected behavior
Better atomicity guarantee
More Details
From
org.apache.spark.sql.execution.streaming.CheckpointFileManager
, which also relies on atomic renames of checkpoints (similar to atomic renames of hyperspace operation logs),Environment
NA