Closed alexwlchan closed 6 years ago
Why are computers hard?
Well, it takes 20 lines of code just to copy a file…
rough scribblings before bed (needs comments!):
# https://stackoverflow.com/q/11614815/1558022
import errno
import glob
import os
import shutil
import uuid
def remove_f(path):
try:
os.remove(path)
except OSError as err:
if err.errno == errno.ENOENT:
pass
else:
raise
if os.path.exists(dst):
return
mole_id = uuid.uuid4()
tmp_dst = shutil.copyfile(src, '%s.%s.tmp' % (dst, mole_id)))
mole_dst = '%s.%s.mole.tmp' % (dst, mole_id)
os.rename(tmp_dst, mole_dst)
matching = sorted(glob.glob('dst.*.mole.tmp'))
if mole_dst != matching[0]:
remove_f(mole_dst)
mole_dst = matching[0]
if os.path.exists(dst):
remove_f(mole_dst)
return
try:
os.rename(mole_dst, dst)
except OSError as err:
if err.errno == errno.ENOENT:
pass
else:
raise
os.unlink(src)
I may be coming to the conversation late. Is there any risk of a file starting to be read in between the two atomic writes? Or is this vanishingly small?
-Joseph
On Fri, Nov 3, 2017 at 12:56 PM, bcail notifications@github.com wrote:
@bcail commented on this pull request.
In loris/utils.py https://github.com/loris-imageserver/loris/pull/380#discussion_r148838394 :
- Moves must work across filesystems. Often temp directories and the
- cache directories live on different filesystems.
os.rename()
can- throw errors if run across filesystems.
- So we try
os.rename()
, but if we detect a cross-filesystem copy, we- switch to
shutil.move()
with some wrappers to make it atomic.- """
- logger.debug('Renaming %r to %r', src, dst)
- try:
- os.rename(src, dst)
- except OSError as err:
- logger.debug('Calling os.rename(%r, %r) failed with %r', src, dst, err)
- if err.errno == errno.EXDEV:
This is based on the safe, atomic file copying algorithm
described in https://stackoverflow.com/a/28090883/1558022.
Isn't some of this unnecessary, because we don't care if the file gets overwritten? If one process creates dst atomically, and then another process overwrites it atomically, is there any harm?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/loris-imageserver/loris/pull/380#pullrequestreview-74146809, or mute the thread https://github.com/notifications/unsubscribe-auth/ABcoaJY-PIrFpQHx-6PyrxBYHsY_vMGBks5sy0XJgaJpZM4QBPjj .
@jrhoads I don't know what the odds are of that happening. If that is an issue (a file being overwritten), I think we'll have the same issue if the directories are on the same filesystem and we can just use the atomic rename (in the first try block of safe_rename). And the issue is already happening in any Loris installs. Does anyone know if overwriting files has been an issue?
I ran some tests, and it looks like there's no issue with doing an os.rename of one file onto an identical one while it's being read.
import asyncio import os import shutil async def rename_file(): os.rename('test2.txt', 'test.txt') print(' *renamed test2.txt to test.txt*') async def read_file(): print('opening file') with open('test.txt', 'rt') as f: print('reading first 10 bytes') first_data = f.read(10) await rename_file() print('reading rest of file') rest_of_data = f.read() print('finished reading file - closed it') print(f'first_data: {first_data}') print(f'rest_of_data: {rest_of_data}') if __name__ == '__main__': assert not os.path.exists('test2.txt') shutil.copyfile('test.txt', 'test2.txt') print('copied test.txt to test2.txt') event_loop = asyncio.get_event_loop() try: print('starting coroutine') coro = read_file() print('entering event loop') event_loop.run_until_complete(coro) finally: print('closing event loop') event_loop.close() assert not os.path.exists('test2.txt')
Isn't some of this unnecessary, because we don't care if the file gets overwritten?
Yeah, that version was a bit OTT. I’ve cut it back and pushed a new version.
This uses the proposed approach in #379 – first try to copy the file using
os.rename()
, but if we detect we’re trying to copy across a filesystem boundary and the OS gets upset, we fall back to usingshutil.move()
in a way that’s atomic.Resolves #379.
/cc @giorgiosironi @bcail