python / cpython

The Python programming language
https://www.python.org
Other
63.36k stars 30.34k forks source link

copy2 doesn't copy metadata on Windows and MacOS #83087

Open 78173dca-9270-4caf-abbb-d619aa8f2ad7 opened 4 years ago

78173dca-9270-4caf-abbb-d619aa8f2ad7 commented 4 years ago
BPO 38906
Nosy @pfmoore, @ronaldoussoren, @giampaolo, @tjguk, @ned-deily, @zware, @eryksun, @zooba, @greggman

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['OS-mac', '3.8', 'type-bug', 'library', 'OS-windows'] title = "copy2 doesn't copy metadata on Windows and MacOS" updated_at = user = 'https://github.com/greggman' ``` bugs.python.org fields: ```python activity = actor = 'ronaldoussoren' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)', 'macOS', 'Windows'] creation = creator = 'greggman' dependencies = [] files = [] hgrepos = [] issue_num = 38906 keywords = [] message_count = 6.0 messages = ['357395', '357411', '357454', '357564', '357572', '357912'] nosy_count = 9.0 nosy_names = ['paul.moore', 'ronaldoussoren', 'giampaolo.rodola', 'tim.golden', 'ned.deily', 'zach.ware', 'eryksun', 'steve.dower', 'greggman'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue38906' versions = ['Python 3.8'] ```

78173dca-9270-4caf-abbb-d619aa8f2ad7 commented 4 years ago

MacOS have extended file attributes. Windows has both extended file attributes and alternate streams. In both OSes copy/cp and of course the Finder and Windows Explorer copy all this data. Python copy2 does not.

On Windows it seems like CopyFileW needs to be called to actually do a full copy.

https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-copyfilew

On MacOS it appears to require copyItem

https://developer.apple.com/documentation/foundation/filemanager/1412957-copyitem

It's kind of unexpected to call a function to copy a file and have it not actually copy the file and have the user lose data

Windows example

dir

Directory: C:\Users\gregg\temp\test

Mode LastWriteTime Length Name ---- ------------- ------ ---- -a---- 11/24/2019 8:58 PM 28 original.txt

Set-Content -Path original.txt -Stream FooBar

cmdlet Set-Content at command pipeline position 1 Supply values for the following parameters: Value[0]: python should copy this too Value[1]:

> Get-Content -Path original.txt -Stream FooBar
python should copy this too
> copy .\original.txt .\copied-with-copy.txt
> Get-Content -Path copied-with-copy.txt -Stream FooBar
python should copy this too
> C:\Users\gregg\AppData\Local\Programs\Python\Python38\python.exe
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> shutil.copy2("original.txt", "copied-with-python3.txt")
>>> exit()
> Get-Content -Path copied-with-python3.txt -Stream FooBar
> Get-Content : Could not open the alternate data stream 'FooBar' of the file 'C:\Users\gregg\temp\test\copied-with-python3.txt'.
At line:1 char:1
+ Get-Content -Path copied-with-python3.txt -Stream FooBar
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     + CategoryInfo          : ObjectNotFound: (C:\Users\gregg\...ith-python3.txt:String) [Get-Content], FileNotFoundException
     + FullyQualifiedErrorId : GetContentReaderFileNotFoundError,Microsoft.PowerShell.Commands.GetContentCommand

MacOS example

$ ls -l -@
total 1120
-rw-r--r--@ 1 gregg  staff  571816 Nov 24 18:48 original.jpg
    com.apple.lastuseddate#PS       16 
    com.apple.macl      72 
    com.apple.metadata:kMDItemWhereFroms       530 
    com.apple.quarantine        57 
$ cp original.jpg copied-with.cp 
$ ls -l -@                      
total 2240
-rw-r--r--@ 1 gregg  staff  571816 Nov 24 18:48 copied-with.cp
    com.apple.lastuseddate#PS       16 
    com.apple.macl      72 
    com.apple.metadata:kMDItemWhereFroms       530 
    com.apple.quarantine        57 
-rw-r--r--@ 1 gregg  staff  571816 Nov 24 18:48 original.jpg
    com.apple.lastuseddate#PS       16 
    com.apple.macl      72 
    com.apple.metadata:kMDItemWhereFroms       530 
    com.apple.quarantine        57
$python3
Python 3.8.0 (default, Nov 24 2019, 18:48:01) 
[Clang 11.0.0 (clang-1100.0.33.8)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> shutil.copy2('original.jpg', 'copied-with-python3.jpg')
'copied-with-python3.jpg'
>>> exit()
$ ls -l -@
total 3360
-rw-r--r--@ 1 gregg  staff  571816 Nov 24 18:48 copied-with-python3.jpg
    com.apple.quarantine        57 
-rw-r--r--@ 1 gregg  staff  571816 Nov 24 18:48 copied-with.cp
    com.apple.lastuseddate#PS       16 
    com.apple.macl      72 
    com.apple.metadata:kMDItemWhereFroms       530 
    com.apple.quarantine        57 
-rw-r--r--@ 1 gregg  staff  571816 Nov 24 18:48 original.jpg
    com.apple.lastuseddate#PS       16 
    com.apple.macl      72 
    com.apple.metadata:kMDItemWhereFroms       530 
    com.apple.quarantine        57
eryksun commented 4 years ago

In Windows, using CopyFileExW and CreateDirectoryExW (with a template directory, for copytree) doesn't agree with how shutil implements copying with separate copyfile and copymode/copystat functions. We'd have to extend copyfile() to support alternate data streams, and we'd have to extend copystat() to support file attributes, extended attributes, and security-resource attributes. Or we'd have to abandon consistency between copy2 and copyfile+copystat.

zooba commented 4 years ago

As a general statement (I haven't taken the time yet to understand all the intricacies of this particular issue), I would prefer to see CopyFile used everywhere on Windows for the performance and reliability benefits. Perfect POSIX emulation here doesn't seem helpful.

giampaolo commented 4 years ago

If we use CopyFileExW in copy2(), then also copystat() and copymode() should be able to copy the same metadata/security-bits/etc as CopyFileExW. I don't know which Windows APIs should be used though.

I sort of agree with Steven that CopyFileExW could be used everywhere (meaning copyfile()), but that basically means breaking backward compatibility re. what is promised in the doc, and am not sure how to deal with that. For that reason it's probably it's better to leave copyfile() alone.

On macOS it seems we can use fcopyfile(3) syscall (which is already exposed) and its flag argument (COPYFILE_METADATA, COPYFILE_DATA. COPYFILE_XATTR, etc,) to implement both copy2() and copystat() / copymode().

eryksun commented 4 years ago

copystat() and copymode() should be able to copy the same metadata/security-bits/etc as CopyFileExW.

Regarding metadata, CopyFileExW copies the basic file info (i.e. FileAttributes, LastAccessTime, LastWriteTime, and ChangeTime). This metadata can be copied separately as the FileBasicInfo, via GetFileInformationByHandleEx and SetFileInformationByHandle. (Zero the CreationTime field to skip setting it.) Note that this includes the change time (i.e. Unix st_ctime) in file systems that support it, such as NTFS and ReFS. It also includes all settable file attributes, including readonly, hidden, system, archive, temporary, and not-content-indexed. Currently we only copy the readonly attribute.

Regarding security bits, CopyFileExW copies security resource attributes (i.e. ATTRIBUTE_SECURITY_INFORMATION), which in Windows 8+ can be referenced by arbitrarily complex expressions in conditional ACEs. See "[MS-DTYP] 2.4.4.17 Conditional ACEs" [1] for details. Security resource attributes can be queried and set by handle via GetSecurityInfo and SetSecurityInfo. This information is set in the system access control list (SACL), but it does not require the privileged ACCESS_SYSTEM_SECURITY right. It only requires READ_CONTROL and WRITE_DAC access.

CopyFileExW also copies extended attributes. These are commonly set on system files, but in this case they're usually "$Kernel." attributes [2], which cannot be set from user mode. IIRC, WSL also uses them. Otherwise extended attributes are not used much at all because the Windows API provides no way to query and set them separately. (They're supported in the NT API via NtQueryEaFile and NtSetEaFile.) When creating a new file via CreateFileW, we can pass it a handle to a template file from which to copy extended attributes. But that doesn't help with copystat(), which requires copying extended attributes onto an existing file.

Regarding data, CopyFileExW copies all $DATA streams [3] in a file, not just the anonymous $DATA stream. The stream names and sizes can be read via GetFileInformationByHandleEx: FileStreamInfo. Just loop over the stream names to try copying them individually.

For complete consistency, copytree should copy named data streams in directories. (A directory can't have an anonymous data stream, but it can have named streams, and it's not uncommon to store metadata about a directory like this. Ignoring this data is inconsistent, but it's a matter of opinion whether complete consistency is worthwhile.) This can be implemented at a high level via CreateDirectoryExW by passing the source directory path as the lpTemplateDirectory parameter. However, CreateDirectoryExW also preserves whether the directory is compressed or encrypted. I don't know whether copytree should preserve those attributes.

[1] https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/10dc22eb-788d-4343-b556-0b6969fe58ca [2] https://docs.microsoft.com/en-us/windows-hardware/drivers/ifs/kernel-extended-attributes [3] https://docs.microsoft.com/en-us/windows/win32/fileio/file-streams

ronaldoussoren commented 4 years ago

The relevant API on macOS is [f]copyfile(3) which is at the POSIX layer. The copyFile API linked to in msg357395 is a higher-level ObjC/Swift API.

Using the copyfile(3) API has another advantage beyond this issue: This API can perform a clone action on APFS when the right flags are used.