python / cpython

The Python programming language
https://www.python.org
Other
62.75k stars 30.07k forks source link

os.path.normpath doesn't preserve unicode #50077

Closed 25a81009-9caa-4147-a13b-b4843223656b closed 14 years ago

25a81009-9caa-4147-a13b-b4843223656b commented 15 years ago
BPO 5827
Nosy @loewis, @ezio-melotti
Files
  • normpath.patch: (Obsolete) Fix for posixpath.normpath and ntpath.normpath.
  • normpath.2.patch: Fix for posixpath.normpath and ntpath.normpath.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/ezio-melotti' closed_at = created_at = labels = ['type-bug', 'library', 'expert-unicode'] title = "os.path.normpath doesn't preserve unicode" updated_at = user = 'https://bugs.python.org/mgiuca' ``` bugs.python.org fields: ```python activity = actor = 'ezio.melotti' assignee = 'ezio.melotti' closed = True closed_date = closer = 'ezio.melotti' components = ['Library (Lib)', 'Unicode'] creation = creator = 'mgiuca' dependencies = [] files = ['13757', '15362'] hgrepos = [] issue_num = 5827 keywords = ['patch'] message_count = 7.0 messages = ['86395', '95223', '95459', '95490', '95493', '95495', '97621'] nosy_count = 5.0 nosy_names = ['loewis', 'kcwu', 'ezio.melotti', 'mgiuca', 'sandberg'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue5827' versions = ['Python 2.6', 'Python 2.7'] ```

    25a81009-9caa-4147-a13b-b4843223656b commented 15 years ago

    In the Python 2.x branch, os.path.normpath will sometimes return a str even if given a unicode. This is not an issue in the Python 3.0 branch.

    This happens specifically when it throws away all string data and constructs its own:

    >>> os.path.normpath(u'')
    '.'
    >>> os.path.normpath(u'.')
    '.'
    >>> os.path.normpath(u'/')
    '/'

    This is a problem if working with code which expects all strings to be unicode strings (sometimes, functions raise exceptions if given a str, when expecting a unicode).

    I have attached patches (with test cases) for posixpath and ntpath which correctly preserve the unicode-ness of the input string, such that the new behaviour is:

    >>> os.path.normpath(u'')
    u'.'
    >>> os.path.normpath(u'.')
    u'.'
    >>> os.path.normpath(u'/')
    u'/'

    I tried it on os2emxpath and plat-riscos/riscospath (the other two OS-specific path modules), and it already worked fine for them. Therefore, this patch fixes all necessary OS-specific versions of os.path.

    ezio-melotti commented 14 years ago

    Thanks for the patch, I tried it on Linux and it seems to solve the problem.

    A few comments about it: 1) I'd change all the self.assertEqual(type(posixpath.normpath(u"")), unicode) to self.assertTrue(isinstance(posixpath.normpath(u""), unicode)); 2) a test for normpath(u'.') should probably be added; 3) in ntpath.py the 'slash' is actually a backslash, so the name of the var should be changed;

    25a81009-9caa-4147-a13b-b4843223656b commented 14 years ago

    Thanks Ezio.

    I've updated the patch to incorporate your suggestions.

    Note that I too have only tested it on Linux, but I tested both posixpath and ntpath (and there is no OS-specific code, except for the filenames themselves).

    I'm not sure if using assertTrue(isinstance ...) is better than assertEqual(type ...), because the type equality checking produces this error: AssertionError: \<type 'str'> != \<type 'unicode'> while isinstance produces this unhelpful error: AssertionError: False is not True

    But oh well, I made the change anyway as most test cases use isinstance.

    ezio-melotti commented 14 years ago

    assertTrue() also accepts a 'msg' argument where to explain what went wrong in case of failure [1].

    [1]: http://docs.python.org/library/unittest.html#unittest.TestCase.assertTrue

    82cd7325-7a5f-4549-80ba-96122c718fe0 commented 14 years ago

    Also, assertTrue has an alias failUnless which I personally find more descriptive (I don't know if either form is preferred for inclusion in Python though).

    ezio-melotti commented 14 years ago

    failUnless is deprecated in Python3.1 [1]. The assert methods are preferred over the fail ones that are now deprecated.

    [1]: http://docs.python.org/3.1/library/unittest.html#unittest.TestCase.failUnless

    ezio-melotti commented 14 years ago

    Fixed in r77442 (trunk) and r77443 (release26-maint), thanks!