> mpiexec -n 4 python from_url.py
Traceback (most recent call last):
File "from_url.py", line 5, in <module>
pfio.v2.from_url('non-existent-dir', create=True)
File "/usr/local/lib/python3.8/site-packages/pfio/v2/fs.py", line 359, in from_url
fs = _from_scheme(scheme, dirname, kwargs, bucket=parsed.netloc)
File "/usr/local/lib/python3.8/site-packages/pfio/v2/fs.py", line 371, in _from_scheme
fs = Local(dirname, **kwargs)
File "/usr/local/lib/python3.8/site-packages/pfio/v2/local.py", line 56, in __init__
os.makedirs(self._cwd)
File "/usr/local/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'non-existent-dir'
When create=True option is set, pfio checks the existence of the directory, then creates if it doesn't exist, but this process is not atomic (especially when combined with NFS, where there can be relatively large delay after a filesystem operation is recognized from other processes).
It is quite difficult to realize it as an atomic filesystem operation, but in this case it is sufficient to specify exists_ok=True option to makedirs.
The same issue can happen to HDFS, so I fixed both local and HDFS filesystems (for S3 it's not necessary) in this PR.
This PR is to fix the issue that
from_url(xxx, create=True)
implemented in #245 can fail in parallel processing situation.When
create=True
option is set, pfio checks the existence of the directory, then creates if it doesn't exist, but this process is not atomic (especially when combined with NFS, where there can be relatively large delay after a filesystem operation is recognized from other processes). It is quite difficult to realize it as an atomic filesystem operation, but in this case it is sufficient to specifyexists_ok=True
option tomakedirs
.The same issue can happen to HDFS, so I fixed both local and HDFS filesystems (for S3 it's not necessary) in this PR.