Open be1526b9-3573-4f3d-abe3-d392fe59c2a9 opened 7 years ago
Trying to use Pathlib and Path.replace on Windows if drive are different leads to an issue:
File "D:\\myscript.py", line 184, in update
client_generated_path.replace(destination_folder)
File "c:\\program files (x86)\\python35-32\\Lib\\pathlib.py", line 1273, in replace
self.\_accessor.replace(self, target)
File "c:\\program files (x86)\\python35-32\\Lib\\pathlib.py", line 377, in wrapped
return strfunc(str(pathobjA), str(pathobjB), \*args)
OSError: [WinError 17] The system cannot move the file to a different disk drive: 'C:\\\\MyFolder' -\> 'D:\\\\MyFolderNewName'
This is a known situation of os.rename, and workaround I found is to use shutil or to copy/delete manually in two steps (e.g. http://stackoverflow.com/questions/21116510/python-oserror-winerror-17-the-system-cannot-move-the-file-to-a-different-d)
When using Pathlib, it's not that easy to workaround using shutil (even if thanks to Brett Cannon now shutil accepts Path in Py3.6, not everybody has Py3.6). At least this should be documented with a recommendation for that situation. I love Pathlib and it's too bad my code becomes complicated when it was so simple :(
Just to confirm, I was able to workaround it with Py3.6:
# client_generated_path.replace(destination_folder)
shutil.move(client_generated_path, destination_folder)
It is reasonable to think about adding a similar feature to pathlib if it doesn't support it and just special-case the drive-to-drive scenario for Windows
Moving a file across volumes isn't atomic. Getting an exception in this case can be useful. There could be a "strict" keyword-only parameter that defaults to False. If it's true, then replace() won't try to move the file.
I agree this needs to be different from replace(), due to not being atomic. That makes it an enhancement, so I'm removing 3.5.
I'm +1 on Path supporting something like shutil.move().
I also support the idea of getting something like shutil.move() into pathlib.
I should also mention that rename() (https://docs.python.org/3/library/pathlib.html#pathlib.Path.rename) and replace() (https://docs.python.org/3/library/pathlib.html#pathlib.Path.replace) already do exist, so it might be best to add a keyword-only flag to one of those for this use-case.
I'm also +1 on adding pathlib.Path.move()
. However I'd prefer we move the shutil
implementation into pathlib, as proposed here. The implementation in shutil
would become a stub that calls into pathlib, roughly:
def move(src, dst, copy_function=copy2):
return pathlib.Path(src).move(dst, copy_function=copy2)
I'm interested in this approach because I'm hoping to add a pathlib.AbstractPath
class in future. Users would be able to subclass AbstractPath
, and by implementing a small handful of low-level abstract methods like stat()
, open()
and iterdir()
, benefit from high-level methods like move()
in their objects. This would work even if the source and destination are different kinds of path (e.g. a local filesystem path and a path in a tar archive).
To achieve this we'd need to make pathlib accept bytes
paths, as shutil does. Personally I'm fine with this - it's not much work and perfectly reasonable.
We'd also need to move other shutil functions that move()
relies on to pathlib, notably the copy*()
functions, excluding copyfileobj()
and friends.
On rename()
/ replace()
: I'd like these to call through to move(atomic=True)
. IIRC these methods are identical on POSIX and only differ slightly on Windows. That's a wrinkle we should smooth out in pathlib IMO.
I'd prefer we move the
shutil
implementation into pathlib
Sounds reasonable to me, though would we want to include copy_function
in the API? It seems like an opportunity to say that Path
is opinionated about the "most move-like way to copy", and if you need to override that then handle it yourself or stick with shutil
.
I think @barneygale 's plan is sound.
I agree with @zooba, exposing the copy_function
in the pathlib
API seems incompatible with the general style of pathlib
. Otherwise the plan looks good.
I agree about copy_function
. Perhaps we could do something like this?
def copy(self, dst, *, follow_symlinks=True, copy_mode=True, copy_stat=False):
...
def move(self, dst, *, atomic=False, **kwargs):
if atomic:
return os.rename(self, dst)
self.copy(dst, **kwargs)
self.unlink()
I don't like the atomic
argument, and especially its default. Implementations should get to implement it as efficiently as possible without needing to respect that kind of argument, and callers shouldn't expect that level of control over semantics.
For example, on Windows we explicitly disallow os.rename
from moving files to other drives, because it wouldn't match our documented behaviour. But a new WindowsPath.move
function should be able to allow that, and take advantage of the OS optimisations (yes, they exist!) for moving a file rather than copying it across drives.
If we were to define semantics for atomic
, the only way to allow this would be "atomic=True
will try to move, and atomic=False
won't bother trying to move", at which point why would anybody ever pass False? Especially since there's no back-compat here, and so code on older versions will have to stick with try: ...rename ... except: ... copy
.
We should make these functions an improvement in semantics, not just discoverability, and ideally more appropriate for whatever platform they're being used on than the POSIX semantics inherent to posixmodule.
So I think the only option we want on move
is follow_symlinks=True
,[^1] and the method "does whatever is necessary to move the file or directory as efficiently as possible into the new location while matching the semantics of an in-place rename" (roughly, copy_mode=True, copy_stat=True
)
Similarly, I think we should simplify copy
down to only have follow_symlinks=True
,[^2] and the method "does whatever is necessary to copy the file or directory as efficiently as possible into the new location while matching the semantics of a newly created file" (roughly, copy_mode=some bits, copy_stat=some bits
- and this one is why the vague definition is important 😉 )
[^1]: Where follow_symlinks=False
is carefully defined to mean "if self
is a link, it moves the link unless that would cause the new link to fail to refer to the same target as the original link, in which case the target is copied into the new location and the original link is removed".
[^2]: I think the same definition as for move works here, but I haven't thought through this one as thoroughly.
My example was a bit rubbish - here's a corrected version:
def copy(self, dst, *, follow_symlinks=True, copy_mode=True, copy_stat=True):
...
def move(self, dst, *, atomic=False, **kwargs):
try:
return os.rename(self, dst)
except OSError:
if atomic:
raise # Move can't be achieved atomically
self.copy(dst, **kwargs)
self.unlink()
So move(atomic=True)
would call os.rename()
only and allow exceptions to propagate, whereas atomic=False
will try a rename, and failing that use a copy/delete. I think there's common use cases for both.
I think I agree with you on removing copy_mode and copy_stat.
I suppose one could argue that folks should just use os.rename()
if they want an atomic rename! :)
I suppose one could argue that folks should just use
os.rename()
if they want an atomic rename! :)
Yeah, I'll argue that one ;)
We'll have to explain that move
may result in both old and new names existing simultaneously while the move takes place, depending on the operating system and file systems in use. That's a perfectly fine place to say that if you'd rather that not happen, and are willing to accept exceptions if it's unavoidable, then you should use rename
.
Should we deprecate pathlib.Path.rename()
and replace()
when we add move()
?
Why? They'll just be aliases (with a default arg) to move()
, won't they? It's easier to just leave them there.
That would be a change of behaviour though, right? Path.rename()
currently raises an exception if you attempt to move across filesystems, whereas if we changed it to call shutil.move()
it would not.
I assumed if you were going to deprecate it, then you'd be adding the atomic
option to be able to get the same behaviour back. If that option isn't there, definitely don't deprecate the old functions.
Oh right. I thought we agreed that users could just call os.rename()
or os.replace()
if they specifically need that behaviour, and that we wouldn't add an atomic argument to Path.move()
. I think I still don't understand what you're proposing! Sorry :/
I think just add the new method, and don't deprecate any of the old ones. Moving people from Path.rename()
to os.rename()
feels like unnecessary churn to inflict.
We probably want to improve the docs of those to point people towards move()
though, as we believe (don't we?) that it'll work more reliably in more circumstances. But there's no reason to force them.
Thanks for the explanation!
Would you also rule out deprecating Path.rename()
and Path.replace()
in future? If so I'm -1 on adding move()
. It's regrettable that we have two methods for moving files, and I wouldn't want to permanently add a third.
Technically, we have two functions for renaming files, and none for moving ;-)
I'd want to see it shown that they're a maintenance burden or an attractive nuisance[^1] before deprecating. Right now, I'm not convinced they are either (a bit of an attractive nuisance, but their long history outweighs that).
[^1]: Something that users assume will solve their issue, but it actually makes things worse for them.
Re-visiting this issue!
Earlier I suggested that we should be able to move the shutil.move()
implementation into pathlib
. I now believe that won't work, because:
bytes
, whereas shutil doescopy_function
stuff doesn't fit well in pathlibSo I think that means we need to call shutil.move()
from pathlib.Path.move()
.
We now have a pathlib._abc.PathBase
base class which ought to have a move()
method too. That method can be implemented atop the existing self.walk()
, self.open()
, etc methods, though it means also implementing versions of copy()
, copytree()
and rmtree()
. These could be private to begin with, or we could add all four public methods at once.
We could make pathlib.Path.move()
accept a pathlib._abc.PathBase | os.PathLike
destination. If the destination is os.PathLike
we call shutil.move()
, otherwise we call super().move()
. If we made copy()
and copytree()
public too, users would have a pretty versatile set of tools to copy/move files not just across local filesystems, but between entirely different types of filesystem. So:
source_path = pathlib.Path('images')
target_path = s3.S3Path('web/images', bucket=..., client=...)
source_path.copytree(target_path)
There's already quite a lot of pathlib changes in 3.13, so I won't look at landing any of this until main
becomes 3.14.
In shutil.move()
and the mv
shell command, if the destination is an existing directory, the source is copied into that directory.
Is this something we want to replicate in pathlib? It can be a source of bugs, e.g. if you shutil.move('a', 'b')
and assume the new file is at 'b'.
Alternatively we could split it into move()
and move_into()
perhaps?
We shouldn't replicate that behaviour, but if the destination ends with a slash then I think we're okay to keep the original name and make that one work (pathlib paths won't ever end with a slash, I believe).
PRs available:
Once those land I can add copytree()
and finally move()
Is this a firm established? Because to me, adding the parts of shutil
to pathlib
looks very wrong. Why you cannot just use shutil
directly? All this hassle with the path-like protocol was for making pathlib
more usable with os
, shutil
and other modules without increasing coupling between modules. Adding the shutil
dependency to pathlib
will provide another argument to avoid using pathlib
.
Without having looked at the details, I agree with Serhiy. We should avoid import shutil
in modules that are "lower level" than it, which includes pathlib.
Passing a Path
to shutil functions and having them work correctly (by calling back into pathlib, rather than assuming anything else about the underlying file system) is the important part. Maybe that already works fine? I haven't tried it with e.g. an S3 or Blob Storage implementation of Path.
But perhaps we should be adding copytree/rmtree methods to pathlib anyway that are self-contained? So the base implementation will work for anything that implements its copy
, glob
and unlink
methods (I agree with copy
, move
and rename
being added).
Is this a firm established? Because to me, adding the parts of
shutil
topathlib
looks very wrong. Why you cannot just useshutil
directly?
This is an argument against pathlib in general, and you're more than a decade too late unfortunately.
All this hassle with the path-like protocol was for making
pathlib
more usable withos
,shutil
and other modules without increasing coupling between modules.
The os.PathLike
stuff meant that os
didn't need to import pathlib
, but you seem to be suggesting the opposite? This patch doesn't make shutil
import pathlib
.
Adding the
shutil
dependency topathlib
will provide another argument to avoid usingpathlib
.
Outline that argument, please? Increased import time or something?
Passing a
Path
to shutil functions and having them work correctly (by calling back into pathlib, rather than assuming anything else about the underlying file system) is the important part. Maybe that already works fine? I haven't tried it with e.g. an S3 or Blob Storage implementation of Path.
Pathlib strips important information (e.g. leading ./
, trailing /
) so we can't implement shutil.rmtree()
using Path.rmtree()
, because we'd get different behaviour. More here: https://discuss.python.org/t/pathlib-preserve-trailing-slash/33389
We could maybe add a RawUnnormalizingLocalPath
class that doesn't perform normalization, which we could employ from shutil.rmtree()
.
There's some more discussion here: https://discuss.python.org/t/incrementally-move-high-level-path-operations-from-shutil-to-pathlib/19208
I might revive that thread as it covers exactly this ticket. Perhaps adding these methods is only compelling as part of making PathBase
public.
so we can't implement
shutil.rmtree()
usingPath.rmtree()
, because we'd get different behaviour.
Should we implement them separately, then? I suspect shutil
is only going to work for "normal" filesystem paths, and so the best thing to do is for pathlib
to have its own implementations in terms of its own functions, and for shutil
to use __fspath__
and reject anything that doesn't return a file system path?
Should we implement them separately, then?
That's what I'm intending to do in the PathBase
implementations - they won't use shutil
except for copyfileobj()
. All the implementations will be generic and built on lower-level methods like PathBase.open()
, symlink_to()
, etc.
The story is more complicated in Path
:
copy()
should call shutil.copy2()
if its argument is os.PathLike
, because shutil.copy2()
has extensive low-level code to properly copy metadatarmtree()
should probably call shutil.rmtree()
to make use of its fd-based safe traversal where available. Alternatively I could add dir_fd
support to Path.walk()
, but it's a bit of a can of wormscopytree()
can be inherited from PathBase
(it uses walk()
, mkdir()
and copy()
)move()
can also be inherited from PathBase
(it uses copytree()
, rmtree()
, etc)
copy()
should callshutil.copy2()
if its argument isos.PathLike
, becauseshutil.copy2()
has extensive low-level code to properly copy metadata
It's not extensive, there's just a choice between using open
and _winapi.CopyFile
on Windows and some error handling. Having separate implementations on WindowsPath
and PosixPath
ought to make this look much simpler (though of course, we need to fall back on a read/write even in the Windows case, for implementations that may not replicate all of _winapi
).
rmtree()
should probably callshutil.rmtree()
to make use of its fd-based safe traversal where available.
It can do this for WindowsPath
and PosixPath
, but the base implementation shouldn't use shutil
because it can't be sure it's dealing with a file system. Using walk
, unlink
and rmdir
here ought to be sufficient, and retrying a failed rmdir
by re-walking the directory is probably as robust as it needs to be.
move()
can also be inherited fromPathBase
(it usescopytree()
,rmtree()
, etc)
And we should also (not necessarily in this issue) add a WindowsPath
override that uses MoveFileEx
, which can already do cross-volume moves if requested.
copy()
should callshutil.copy2()
if its argument isos.PathLike
, becauseshutil.copy2()
has extensive low-level code to properly copy metadataIt's not extensive, there's just a choice between using
open
and_winapi.CopyFile
on Windows and some error handling.
It can call through to copystat()
, which does more. It feels pretty icky to copy-paste all that into pathlib, right? Maybe I'm overreacting.
rmtree()
should probably callshutil.rmtree()
to make use of its fd-based safe traversal where available.It can do this for
WindowsPath
andPosixPath
, but the base implementation shouldn't useshutil
because it can't be sure it's dealing with a file system. Usingwalk
,unlink
andrmdir
here ought to be sufficient, and retrying a failedrmdir
by re-walking the directory is probably as robust as it needs to be.
Yes that's what I mean, sorry for not being clear. Path.rmtree()
calls shutil.rmtree()
, but PathBase.rmtree()
is a generic implementation atop walk()
, unlink()
and rmdir()
. This is implemented in #119060
It can call through to
copystat()
, which does more.
Yeah, copystat
is a bit nasty. Maybe it's something that deserves a (private?) os
implementation? It seems more in line with that module to me.
Most of these functions seem to be careful error handling. I'm not sure we can do anything but copy paste all that.
Most of these functions seem to be careful error handling. I'm not sure we can do anything but copy paste all that.
Couldn't we call shutil.copy2()
from Path.copy()
if the target is os.PathLike
? That's how I've implemented it in #119058, and it avoids any copy-pasting!
os.PathLike
is the criteria used by things outside of pathlib, not by pathlib itself, so I hope you're just using it casually. pathlib ought to know whether its own subclasses are for file systems or not, and if it's a user subclass, then the implementer should override.
But the main objection is to the architectural coupling between the modules - shutil
should not be a dependency of pathlib
, just as it should not be a dependency of os.path
. The direction of the relationship flows the other way.
The best way to deal with it would be to move the implementations from shutil
into pathlib
and then update shutil
to call them there (or move them into os
or os.path
).
But the main objection is to the architectural coupling between the modules -
shutil
should not be a dependency ofpathlib
, just as it should not be a dependency ofos.path
. The direction of the relationship flows the other way.
I agree that shutil
is higher-level than os
, but I don't think that shutil
is uniformly higher level than pathlib
. I think they're mostly on about the same conceptual level, providing "high level" functions built on lower-level building blocks. e.g. pathlib has Path.glob()
and Path.walk()
that do recursive directory walking using low-level scandir()
, and shutil has copytree()
and rmtree()
that do the same.
In some cases shutil
is higher-level, particularly its archiving operations.
In other cases it's slightly lower-level IMO, particularly the implementations of copyfile()
, copystat()
and friends, which wouldn't seem out-of-place in os
, and deal with low-level facets of the local filesystem (BSD symlink permissions, xattrs etc) that pathlib doesn't currently touch.
Well, that's my current view anyway. You have decades more experience on this than I do, so I'll see what it looks like to implement this without pathlib importing shutil :)
Very off-topic, but I seem to only be getting email notifications of @barneygale's comments, and nothing for any of @zooba's. Which makes following the conversation weirdly difficult... Does anyone know how this could even happen? I have checked, and Steve's messages aren't ending up in my spam folder, so I don't think it's a problem with my email client.
The best way to deal with it would be to move the implementations from
shutil
intopathlib
and then updateshutil
to call them there
Just to be clear, do you mean that shutil
would call new module functions in pathlib
, or that shutil
would instantiate some kind of Path
object and call appropriate methods?
Because if it's the latter, I'm pretty sure we're going to need a new path type (I'll call it "PedanticPath" here) that sits above Path
in the hierarchy:
Unlike Path
, this type would support bytes
paths and skip all of pathlibs usual normalization, which means instantiating a PedanticPath
object from a string is "lossless" and safe to do from shutil
functions. It's actually pretty easy to implement, because PathBase
also doesn't normalize by default.
(I think you probably meant module functions, but I wanted to check)
I seem to only be getting email notifications of @barneygale's comments, and nothing for any of @zooba's.
I wouldn't blame you for filtering my messages 😆 But no, I'm afraid I don't know of any reason other than that.
Just to be clear, do you mean that
shutil
would call new module functions inpathlib
, or thatshutil
would instantiate some kind ofPath
object and call appropriate methods?
I was deliberately non-specific, because I don't know off the top of my head which would be best and I trust you to figure it out.
I'd half hoped that instantiating Path
from shutil would be enough, but if you say it's not, then I'm a bit sad but I believe you. Bytes paths should be round-tripping through surrogateescape
though, so there shouldn't be any loss. I do recall the discussions about trailing backslashes though.
Now you've shown what would have to happen to the hierarchy, I agree that module-level helper functions are probably best, and putting them in os
probably makes even more sense. That module has already gone so far beyond exposing merely what POSIX specifies that I don't think we're breaking anything architecturally there, and the functions ought to only rely on os
and os.path
functions anyway.
Trailing slashes are indeed the sticking point :-(. If I could go back in time I'd ask Antoine to retain them somehow when he designed pathlib, then go kill baby hitler etc.
I like the idea of moving the lowest-level stuff into os
or os.path
. I'll try that :-)
But no, I'm afraid I don't know of any reason other than that.
Would you do me a favour, and add a comment to https://github.com/pfmoore/editables/issues/24, just as somewhere I know I should see notifications from? It'll give me an idea if it's just this repo or more general.
Weird. I got the notification for that. So the problem is just on this issue (or maybe this project). I'll do some investigating.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.8', 'type-feature', 'library', '3.9', '3.10']
title = 'Support moving across filesystems in pathlib.Path, as shutil.move() does'
updated_at =
user = 'https://bugs.python.org/LaurentMazuel'
```
bugs.python.org fields:
```python
activity =
actor = 'eryksun'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'Laurent.Mazuel'
dependencies = []
files = []
hgrepos = []
issue_num = 29805
keywords = []
message_count = 6.0
messages = ['289549', '289552', '289559', '289630', '289687', '289688']
nosy_count = 8.0
nosy_names = ['brett.cannon', 'paul.moore', 'eric.smith', 'tim.golden', 'Laurent.Mazuel', 'zach.ware', 'eryksun', 'steve.dower']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue29805'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']
```
Linked PRs