python / cpython

The Python programming language
https://www.python.org
Other
63.4k stars 30.36k forks source link

shlex not posix compliant when parsing "foo#bar" #51860

Open 6d69c0ac-1a5e-48a1-bd0d-eb9b2943948f opened 14 years ago

6d69c0ac-1a5e-48a1-bd0d-eb9b2943948f commented 14 years ago
BPO 7611
Nosy @terryjreedy, @merwok, @meadori
Files
  • lexer_test.py: test to show shlex behaviour
  • shlex_posix.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', 'library', '3.9', '3.10'] title = 'shlex not posix compliant when parsing "foo#bar"' updated_at = user = 'https://bugs.python.org/jjdmol2' ``` bugs.python.org fields: ```python activity = actor = 'iritkatriel' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'jjdmol2' dependencies = [] files = ['15709', '15718'] hgrepos = [] issue_num = 7611 keywords = ['patch'] message_count = 7.0 messages = ['97081', '97082', '97125', '112740', '148270', '148292', '148456'] nosy_count = 6.0 nosy_names = ['terry.reedy', 'ferringb', 'eric.araujo', 'meador.inge', 'jjdmol2', 'cadf'] pr_nums = [] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue7611' versions = ['Python 3.9', 'Python 3.10'] ```

    6d69c0ac-1a5e-48a1-bd0d-eb9b2943948f commented 14 years ago

    The shlex parser parses "foo#bar" as "foo", discarding the rest as a comment. This is actually one of the test cases, even in POSIX mode.

    However, POSIX (see below) only allows comments to start at the beginning of a token, so "foo#bar" has to result in a "foo#bar" token. To easily see this, do "echo foo#bar" in bash, versus "echo foo #bar".

    Fixing this might break some applications that rely on this broken behaviour, even though they're not strictly POSIX compliant.

    POSIX 2008, Rationale C.2.3 (which refers to Shell & Utilities 2.3(10)):

    The (10) rule about '#' as the current character is the first in the sequence in which a new token is being assembled. The '#' starts a comment only when it is at the beginning of a token. This rule is also written to indicate that the search for the end-of-comment does not consider escaped \<newline> specially, so that a comment cannot be continued to the next line.

    6d69c0ac-1a5e-48a1-bd0d-eb9b2943948f commented 14 years ago

    Attached a program which shows the relevant behaviour:

    import shlex
    
    tests = [ "foo#bar", "foo #bar" ]
    
    for t in tests:
      print "%s -> %s" % (t,[x for x in shlex.shlex(t,posix=True)])

    results in

    $ python lexer_test.py
    foo#bar -> ['foo']
    foo #bar -> ['foo']

    (expected of course is ['foo#bar'] on the first line).

    d6b6fd0c-c508-448a-8aee-ea900bbfcbb8 commented 14 years ago

    Here's a patch addressing the behavior described.

    terryjreedy commented 14 years ago

    Given that test_shlex.py tests for the current behavior, it is hard to call this a bug in the tracker sense of the term. I would only change with a new version.

    The manual just says "When operating in POSIX mode, shlex will try to be as close as possible to the POSIX shell parsing rules." but gives no reference to which authority it is following or what the rules are in either case. Manual section 23.2.2. Parsing Rules only discusses the differences between posix and non-posix rules, not the common rules.

    I suspect this module was written well over a decade ago, maybe closer to two. Is it possible that earlier versions were different on this issue? Or is the 2008 version only cosmetically different some 1990s version?

    merwok commented 12 years ago

    The manual just says "When operating in POSIX mode, shlex will try to be as close as possible to the POSIX shell parsing rules." but gives no reference to which authority it is following or what the rules are in either case. I think it actually does: The POSIX specification defines the behavior of a compliant /bin/sh shell.

    See also bpo-1521950.

    terryjreedy commented 12 years ago

    The doc section has no reference, as in a live web link, to any version of the POSIX specification. This is unlike other doc sections that implement various RFCs (which also get updated). The docs also link to specific references for the Unicode version supported, which has changed from version to version.

    The OP quotes (without giving a link) from the 2008 version. POSIX and shlex are much older than that, implying that shlex might conform to an earlier version, just as other modules implement older RFCs that have been superceded.

    meadori commented 12 years ago

    Here a some of the relevant links from POSIX 2008:

    1. Shell Command Language - http://pubs.opengroup.org/onlinepubs/9699919799/idx/shell.html
    2. Shell Command Language Rationale - http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html

    Sections 2.3 (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03) and 2.10 (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10) of [1] are particularly relevant.