Open 78173dca-9270-4caf-abbb-d619aa8f2ad7 opened 4 years ago
MacOS have extended file attributes. Windows has both extended file attributes and alternate streams. In both OSes copy/cp and of course the Finder and Windows Explorer copy all this data. Python copy2 does not.
On Windows it seems like CopyFileW needs to be called to actually do a full copy.
https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-copyfilew
On MacOS it appears to require copyItem
https://developer.apple.com/documentation/foundation/filemanager/1412957-copyitem
It's kind of unexpected to call a function to copy a file and have it not actually copy the file and have the user lose data
Windows example
dir
Directory: C:\Users\gregg\temp\test
Mode LastWriteTime Length Name ---- ------------- ------ ---- -a---- 11/24/2019 8:58 PM 28 original.txt
Set-Content -Path original.txt -Stream FooBar
cmdlet Set-Content at command pipeline position 1 Supply values for the following parameters: Value[0]: python should copy this too Value[1]:
> Get-Content -Path original.txt -Stream FooBar
python should copy this too
> copy .\original.txt .\copied-with-copy.txt
> Get-Content -Path copied-with-copy.txt -Stream FooBar
python should copy this too
> C:\Users\gregg\AppData\Local\Programs\Python\Python38\python.exe
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> shutil.copy2("original.txt", "copied-with-python3.txt")
>>> exit()
> Get-Content -Path copied-with-python3.txt -Stream FooBar
> Get-Content : Could not open the alternate data stream 'FooBar' of the file 'C:\Users\gregg\temp\test\copied-with-python3.txt'.
At line:1 char:1
+ Get-Content -Path copied-with-python3.txt -Stream FooBar
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (C:\Users\gregg\...ith-python3.txt:String) [Get-Content], FileNotFoundException
+ FullyQualifiedErrorId : GetContentReaderFileNotFoundError,Microsoft.PowerShell.Commands.GetContentCommand
MacOS example
$ ls -l -@
total 1120
-rw-r--r--@ 1 gregg staff 571816 Nov 24 18:48 original.jpg
com.apple.lastuseddate#PS 16
com.apple.macl 72
com.apple.metadata:kMDItemWhereFroms 530
com.apple.quarantine 57
$ cp original.jpg copied-with.cp
$ ls -l -@
total 2240
-rw-r--r--@ 1 gregg staff 571816 Nov 24 18:48 copied-with.cp
com.apple.lastuseddate#PS 16
com.apple.macl 72
com.apple.metadata:kMDItemWhereFroms 530
com.apple.quarantine 57
-rw-r--r--@ 1 gregg staff 571816 Nov 24 18:48 original.jpg
com.apple.lastuseddate#PS 16
com.apple.macl 72
com.apple.metadata:kMDItemWhereFroms 530
com.apple.quarantine 57
$python3
Python 3.8.0 (default, Nov 24 2019, 18:48:01)
[Clang 11.0.0 (clang-1100.0.33.8)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> shutil.copy2('original.jpg', 'copied-with-python3.jpg')
'copied-with-python3.jpg'
>>> exit()
$ ls -l -@
total 3360
-rw-r--r--@ 1 gregg staff 571816 Nov 24 18:48 copied-with-python3.jpg
com.apple.quarantine 57
-rw-r--r--@ 1 gregg staff 571816 Nov 24 18:48 copied-with.cp
com.apple.lastuseddate#PS 16
com.apple.macl 72
com.apple.metadata:kMDItemWhereFroms 530
com.apple.quarantine 57
-rw-r--r--@ 1 gregg staff 571816 Nov 24 18:48 original.jpg
com.apple.lastuseddate#PS 16
com.apple.macl 72
com.apple.metadata:kMDItemWhereFroms 530
com.apple.quarantine 57
In Windows, using CopyFileExW and CreateDirectoryExW (with a template directory, for copytree) doesn't agree with how shutil implements copying with separate copyfile and copymode/copystat functions. We'd have to extend copyfile() to support alternate data streams, and we'd have to extend copystat() to support file attributes, extended attributes, and security-resource attributes. Or we'd have to abandon consistency between copy2 and copyfile+copystat.
As a general statement (I haven't taken the time yet to understand all the intricacies of this particular issue), I would prefer to see CopyFile used everywhere on Windows for the performance and reliability benefits. Perfect POSIX emulation here doesn't seem helpful.
If we use CopyFileExW in copy2(), then also copystat() and copymode() should be able to copy the same metadata/security-bits/etc as CopyFileExW. I don't know which Windows APIs should be used though.
I sort of agree with Steven that CopyFileExW could be used everywhere (meaning copyfile()), but that basically means breaking backward compatibility re. what is promised in the doc, and am not sure how to deal with that. For that reason it's probably it's better to leave copyfile() alone.
On macOS it seems we can use fcopyfile(3) syscall (which is already exposed) and its flag argument (COPYFILE_METADATA, COPYFILE_DATA. COPYFILE_XATTR, etc,) to implement both copy2() and copystat() / copymode().
copystat() and copymode() should be able to copy the same metadata/security-bits/etc as CopyFileExW.
Regarding metadata, CopyFileExW copies the basic file info (i.e. FileAttributes, LastAccessTime, LastWriteTime, and ChangeTime). This metadata can be copied separately as the FileBasicInfo, via GetFileInformationByHandleEx and SetFileInformationByHandle. (Zero the CreationTime field to skip setting it.) Note that this includes the change time (i.e. Unix st_ctime) in file systems that support it, such as NTFS and ReFS. It also includes all settable file attributes, including readonly, hidden, system, archive, temporary, and not-content-indexed. Currently we only copy the readonly attribute.
Regarding security bits, CopyFileExW copies security resource attributes (i.e. ATTRIBUTE_SECURITY_INFORMATION), which in Windows 8+ can be referenced by arbitrarily complex expressions in conditional ACEs. See "[MS-DTYP] 2.4.4.17 Conditional ACEs" [1] for details. Security resource attributes can be queried and set by handle via GetSecurityInfo and SetSecurityInfo. This information is set in the system access control list (SACL), but it does not require the privileged ACCESS_SYSTEM_SECURITY right. It only requires READ_CONTROL and WRITE_DAC access.
CopyFileExW also copies extended attributes. These are commonly set on system files, but in this case they're usually "$Kernel." attributes [2], which cannot be set from user mode. IIRC, WSL also uses them. Otherwise extended attributes are not used much at all because the Windows API provides no way to query and set them separately. (They're supported in the NT API via NtQueryEaFile and NtSetEaFile.) When creating a new file via CreateFileW, we can pass it a handle to a template file from which to copy extended attributes. But that doesn't help with copystat(), which requires copying extended attributes onto an existing file.
Regarding data, CopyFileExW copies all $DATA streams [3] in a file, not just the anonymous $DATA stream. The stream names and sizes can be read via GetFileInformationByHandleEx: FileStreamInfo. Just loop over the stream names to try copying them individually.
For complete consistency, copytree should copy named data streams in directories. (A directory can't have an anonymous data stream, but it can have named streams, and it's not uncommon to store metadata about a directory like this. Ignoring this data is inconsistent, but it's a matter of opinion whether complete consistency is worthwhile.) This can be implemented at a high level via CreateDirectoryExW by passing the source directory path as the lpTemplateDirectory parameter. However, CreateDirectoryExW also preserves whether the directory is compressed or encrypted. I don't know whether copytree should preserve those attributes.
[1] https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/10dc22eb-788d-4343-b556-0b6969fe58ca [2] https://docs.microsoft.com/en-us/windows-hardware/drivers/ifs/kernel-extended-attributes [3] https://docs.microsoft.com/en-us/windows/win32/fileio/file-streams
The relevant API on macOS is [f]copyfile(3) which is at the POSIX layer. The copyFile API linked to in msg357395 is a higher-level ObjC/Swift API.
Using the copyfile(3) API has another advantage beyond this issue: This API can perform a clone action on APFS when the right flags are used.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['OS-mac', '3.8', 'type-bug', 'library', 'OS-windows']
title = "copy2 doesn't copy metadata on Windows and MacOS"
updated_at =
user = 'https://github.com/greggman'
```
bugs.python.org fields:
```python
activity =
actor = 'ronaldoussoren'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)', 'macOS', 'Windows']
creation =
creator = 'greggman'
dependencies = []
files = []
hgrepos = []
issue_num = 38906
keywords = []
message_count = 6.0
messages = ['357395', '357411', '357454', '357564', '357572', '357912']
nosy_count = 9.0
nosy_names = ['paul.moore', 'ronaldoussoren', 'giampaolo.rodola', 'tim.golden', 'ned.deily', 'zach.ware', 'eryksun', 'steve.dower', 'greggman']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue38906'
versions = ['Python 3.8']
```