python / cpython

The Python programming language
https://www.python.org/
Other
60.87k stars 29.39k forks source link

Use copy_file_range() in shutil.copyfile() (server-side copy) #81340

Open giampaolo opened 5 years ago

giampaolo commented 5 years ago
BPO 37159
Nosy @facundobatista, @ncoghlan, @vstinner, @giampaolo, @encukou, @albertz, @vadmium, @desbma, @pablogsal
Files
  • patch.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['library', '3.9', 'performance'] title = 'Use copy_file_range() in shutil.copyfile() (server-side copy)' updated_at = user = 'https://github.com/giampaolo' ``` bugs.python.org fields: ```python activity = actor = 'Albert.Zeyer' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'giampaolo.rodola' dependencies = [] files = ['48392'] hgrepos = [] issue_num = 37159 keywords = ['patch'] message_count = 6.0 messages = ['344671', '344679', '344680', '344691', '344693', '383996'] nosy_count = 11.0 nosy_names = ['facundobatista', 'ncoghlan', 'vstinner', 'giampaolo.rodola', 'StyXman', 'petr.viktorin', 'neologix', 'Albert.Zeyer', 'martin.panter', 'desbma', 'pablogsal'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'performance' url = 'https://bugs.python.org/issue37159' versions = ['Python 3.9'] ```

    Linked PRs

    giampaolo commented 5 years ago

    This is a follow up of bpo-33639 (zero-copy via sendfile()) and bpo-26828 (os.copy_file_range()). On [Linux 4.5 / glib 2.27] shutil.copyfile() will use os.copy_file_range() instead of os.sendfile(). According to my benchmarks performances are the same but when dealing with NFS copy_file_range() is supposed to attempt doing a server-side copy, meaning there will be no exchange of data between client and server, making the copy operation an order of magnitude faster.

    Before proceeding unit-tests for big-file support should be added first (bpo-37096). We didn't hit the 3.8 deadline but I actually prefer to land this in 3.9 as I want to experiment with it a bit (copy_file_range() is quite new, bpo-26828 is still a WIP).

    vstinner commented 5 years ago

    Oh, I already created https://bugs.python.org/issue37157

    Can we move the discussion there?

    giampaolo commented 5 years ago

    bpo-37157 is for reflink / CoW copy, this one is not.

    vstinner commented 5 years ago

    bpo-37157 is for reflink / CoW copy, this one is not.

    Oh sorry, it seems like I misunderstood copy_file_range(). So it doesn't use/support CoW?

    giampaolo commented 5 years ago

    Nope, it doesn't (see man page). We can simply use FICLONE (cp does the same).

    bf28a09e-6d72-4c82-a4b1-8cc669e32358 commented 3 years ago

    According to the man page of copy_file_range (https://man7.org/linux/man-pages/man2/copy_file_range.2.html), copy_file_range also should support copy-on-write:

      copy_file_range() gives filesystems an opportunity to implement
      "copy acceleration" techniques, such as the use of reflinks
      (i.e., two or more inodes that share pointers to the same copy-
      on-write disk blocks) or server-side-copy (in the case of NFS).

    Is this wrong?

    However, while researching more about FICLONE vs copy_file_range, I found e.g. this: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24399

    Which suggests that there are other problems with copy_file_range?

    illia-v commented 2 years ago

    FYI, GNU Coreutils 9.0 (released in September 2021) changed cp to:

    https://lists.gnu.org/archive/html/info-gnu/2021-09/msg00010.html