iterative / dvc-objects

dvc objects - contains filesystem and object-db level abstractions to use in dvc and dvc-data
https://dvc.org
Apache License 2.0
12 stars 21 forks source link

dvc push may fail for remote hosted on network attached drives #306

Open abahnasy opened 2 months ago

abahnasy commented 2 months ago

OS: Linux Python 3.8 dvc 3.55.2 dvc-azure 3.1.0 dvc-data 3.16.5 dvc-http 2.32.0 dvc-objects 5.1.0 dvc-render 1.0.2 dvc-studio-client 0.21.0 dvc-task 0.4.0

Hi,

I come across a special case where the dvc push fails due to hosting the remote on a network shared drive (Samba Drive).

During pushing the files from the local cache to the remote storage, DVC will reach copyfile here and attempt to do CoW (Copy on Write) here but due to the fact that both (cache and remote) are on different file systems, it will fail and try to unlink here the linking happened due to calling fcntl.ioctl() here and then copyfile will try to copy the file using shutil.copyfile(src, dst) here. At this point, the call fails due to FileNotFoundError exception as the dst is seen as file (os.path.exists(dst) returns True) but it is not actually found on the filesystem.

it seems that due to network latency or caching from SMB drive, the unlinking here is not reflected immediately and I've to wait few milliseconds until the dst path is removed and could be seen as not found on the filesystem (os.path.exists(dst) returns False). Then, normal copying using shutil.copyfile(src, dst) could work fine.

I've written a simple script by extracting the code snippets from lib to reproduce the error.

import os
import shutil
from secrets import token_urlsafe
import fcntl
import time

FICLONE = 0x40049409
def reflink(src, dst):
    src_fd = os.open(src, os.O_RDONLY)
    try:
        dst_fd = os.open(dst, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o666)
    except OSError:
        os.close(src_fd)
        raise

    try:

        fcntl.ioctl(dst_fd, FICLONE, src_fd)
    except OSError as e:
        print(">>>", e)
        os.close(src_fd)
        os.close(dst_fd)
        try:
            os.unlink(dst)

            ### Temp fix, comment below code to reproduce the error ###
            while os.path.exists(dst): # Check if file still exists
                time.sleep(0.1)
                print(f"{dst} still exists after 0.1 second")
            ### Temp fix, comment above code to reproduce the error ###

            print(dst, "exists: ", os.path.exists(dst)) # after unlink, the `dst` shouldn't be found
        except OSError:
            print(">>> failed to unlink")
            pass
        raise
    else:
        os.close(src_fd)
        os.close(dst_fd)

def tmp_fname(prefix: str = "") -> str:
    """Temporary name for a partial download"""
    return f"{prefix}.{token_urlsafe(16)}.tmp"

rpath = "/mnt/interior_sensing/test-dvc/randombinfile.bin" #dest Samba drive
parent = "/mnt/interior_sensing/test-dvc"
lpath = "./randombitsfile.bin" # local filesystem
tmp_file = os.path.join(parent, tmp_fname())
# create random file
fileSizeInBytes = 1024*1024
with open(lpath, 'wb') as fout:
    fout.write(os.urandom(fileSizeInBytes)) 
try:
    reflink(lpath, tmp_file)
    # NOTE: reflink may or may not clone src permissions
    os.chmod(dest, 0o666 & ~umask)
    os.replace(tmp_file, rpath)
    print("Reflink Successfully !!!")
    exit()
except OSError:
    pass
# if reflink is succeeded, code will not reach here. Otherwise, do normal copy
shutil.copyfile(lpath, tmp_file)
os.replace(tmp_file, rpath)

Error output:

>>> [Errno 18] Invalid cross-device link
/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp exists:  True
Traceback (most recent call last):
  File "/home/user/env/test_dvc_script.py", line 67, in <module>
    shutil.copyfile(lpath, tmp_file)
  File "/home/user/env/lib/python3.9/shutil.py", line 266, in copyfile
    with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp'

Output after applying the temp fix

>>> [Errno 18] Invalid cross-device link #due to the fact that src, dst are not different filesystems
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp exists:  False
(env) user@machine:~/workspace$ ls /mnt/interior_sensing/test-dvc/
randombinfile.bin # file copied successfully

I'm not sure if this a case that should be handled or it is a case specific to my setup, but I thought it is worth mentioning.

shcheklein commented 2 months ago

@skshetry can it be related to the recent changes?

shcheklein commented 2 months ago

@skshetry what is your take here?

skshetry commented 1 month ago

This does not look related to recent changes at all.

with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp

This should never fail as we are opening in mode="wb", unless /mnt/interior_sensing/test-dvc does not exist.

@abahnasy, does that directory exist? Can you try to add os.makedirs(...) in the script above and see if it works? (I guess it does since unlink did not return any error).