python / cpython

The Python programming language
https://www.python.org
Other
63.16k stars 30.25k forks source link

shutil.disk_usage() on Windows can't properly handle unicode #70518

Closed giampaolo closed 6 years ago

giampaolo commented 8 years ago
BPO 26330
Nosy @pfmoore, @vstinner, @giampaolo, @tjguk, @zware, @eryksun, @zooba, @Mariatta, @csabella
PRs
  • python/cpython#5184
  • python/cpython#5188
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['3.7', 'OS-windows', 'docs'] title = "shutil.disk_usage() on Windows can't properly handle unicode" updated_at = user = 'https://github.com/giampaolo' ``` bugs.python.org fields: ```python activity = actor = 'Mariatta' assignee = 'docs@python' closed = True closed_date = closer = 'Mariatta' components = ['Documentation', 'Windows'] creation = creator = 'giampaolo.rodola' dependencies = [] files = [] hgrepos = [] issue_num = 26330 keywords = ['patch'] message_count = 11.0 messages = ['260017', '260018', '260020', '260021', '260022', '260023', '309936', '309941', '309947', '309984', '309985'] nosy_count = 10.0 nosy_names = ['paul.moore', 'vstinner', 'giampaolo.rodola', 'tim.golden', 'docs@python', 'zach.ware', 'eryksun', 'steve.dower', 'Mariatta', 'cheryl.sabella'] pr_nums = ['5184', '5188'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue26330' versions = ['Python 3.6', 'Python 3.7'] ```

    giampaolo commented 8 years ago

    On Python 3.4, Windows 7:

    >>> import shutil, os
    >>> path = 'psuugxik1s0è'
    >>> os.stat(path)
    os.stat_result(st_mode=33206, st_ino=6755399441249628, st_dev=3158553679, st_nlink=1, st_uid=0, st_gid=0, st_size=27136, st_atime=1455
    116789, st_mtime=1455116789, st_ctime=1455116789)
    >>>
    >>> shutil.disk_usage(path)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\python34\lib\shutil.py", line 989, in disk_usage
        total, free = nt._getdiskusage(path)
    NotADirectoryError: [WinError 267] The directory name is invalid
    >>>
    vstinner commented 8 years ago

    total, free = nt._getdiskusage(path) NotADirectoryError: [WinError 267] The directory name is invalid

    The underlying C function is GetDiskFreeSpaceEx(): https://msdn.microsoft.com/fr-fr/library/windows/desktop/aa364937%28v=vs.85%29.aspx

    It takes a lpDirectoryName parameter: "A directory on the disk."

    Is psuugxik1s0è a directory?

    It looks more like a shutil.disk_usage() documentation issue than an Unicode issue.

    giampaolo commented 8 years ago

    You are right, my bad. I'll fix doc mentioning that on Windows "path" can only be a directory (on UNIX it can also be a file).

    vstinner commented 8 years ago

    You are right, my bad.

    No problem. I read the doc before replying, and it is not said that the path must exist or must be a directory: https://docs.python.org/dev/library/shutil.html#shutil.disk_usage

    I'll fix doc mentioning that on Windows "path" can only be a directory (on UNIX it can also be a file).

    Great!

    giampaolo commented 8 years ago

    Different but kind of related, disk_usage() is not able to accept bytes:

    >>> shutil.disk_usage(b'.')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\python34\lib\shutil.py", line 989, in disk
        total, free = nt._getdiskusage(path)
    TypeError: must be str, not bytes
    >>>
    vstinner commented 8 years ago

    Different but kind of related, disk_usage() is not able to accept bytes:

    On Python 3, I don't think that it's a big issue: bytes filenames are deprecated.

    See the current thread on python-dev: https://mail.python.org/pipermail/python-dev/2016-February/143150.html

    It's really much better to use Unicode on Windows, and I also suggest you to use Unicode on UNIX/BSD.

    csabella commented 6 years ago

    I've submitted a PR for the documentation change.

    eryksun commented 6 years ago

    This is the high-level shutil module, so why not try to use the resolved parent directory? For example:

        def disk_usage(path):
            try:
                total, free = nt._getdiskusage(path)
            except NotADirectoryError:
                path = os.path.dirname(nt._getfinalpathname(path))
                total, free = nt._getdiskusage(path)
            used = total - free
            return _ntuple_diskusage(total, used, free)

    Also, as noted in msg260022, nt._getdiskusage was overlooked when implementing PEP-529. The same applies to nt._getfinalpathname and nt._getvolumepathname. nt._getfullpathname works with bytes because it takes an argument-clinic path_t instead of unicode or Py_UNICODE. I think the other 3 should be rewritten to use path_t, but it's out of scope for this issue.

    Mariatta commented 6 years ago

    New changeset ee3b83547c6b0cac1da2cb44aaaea533a1d1bbc8 by Mariatta (Cheryl Sabella) in branch 'master': bpo-26330: Update shutil.disk_usage() documentation (GH-5184) https://github.com/python/cpython/commit/ee3b83547c6b0cac1da2cb44aaaea533a1d1bbc8

    Mariatta commented 6 years ago

    New changeset fb8569e36f2629654d5bc9c7ba05978edce408f4 by Mariatta (Miss Islington (bot)) in branch '3.6': bpo-26330: Update shutil.disk_usage() documentation (GH-5184) (GH-5188) https://github.com/python/cpython/commit/fb8569e36f2629654d5bc9c7ba05978edce408f4

    Mariatta commented 6 years ago

    Thanks!