pfnet / pfio

IO library to access various filesystems with unified API
https://pfio.readthedocs.io/
MIT License
52 stars 21 forks source link

Prevent race condition for from_url(create=True) in case of parallel processing #254

Closed belltailjp closed 2 years ago

belltailjp commented 2 years ago

This PR is to fix the issue that from_url(xxx, create=True) implemented in #245 can fail in parallel processing situation.

# from_url.py
import pfio
pfio.v2.from_url('non-existent-dir', create=True)
> mpiexec -n 4 python from_url.py
Traceback (most recent call last):
  File "from_url.py", line 5, in <module>
    pfio.v2.from_url('non-existent-dir', create=True)
  File "/usr/local/lib/python3.8/site-packages/pfio/v2/fs.py", line 359, in from_url
    fs = _from_scheme(scheme, dirname, kwargs, bucket=parsed.netloc)
  File "/usr/local/lib/python3.8/site-packages/pfio/v2/fs.py", line 371, in _from_scheme
    fs = Local(dirname, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pfio/v2/local.py", line 56, in __init__
    os.makedirs(self._cwd)
  File "/usr/local/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'non-existent-dir'

When create=True option is set, pfio checks the existence of the directory, then creates if it doesn't exist, but this process is not atomic (especially when combined with NFS, where there can be relatively large delay after a filesystem operation is recognized from other processes). It is quite difficult to realize it as an atomic filesystem operation, but in this case it is sufficient to specify exists_ok=True option to makedirs.

The same issue can happen to HDFS, so I fixed both local and HDFS filesystems (for S3 it's not necessary) in this PR.

kuenishi commented 2 years ago

/test

pfn-ci-bot commented 2 years ago

Successfully created a job for commit 1e373c9:

pfn-ci-bot commented 2 years ago

Successfully created a job for commit 1e373c9: