python / cpython

The Python programming language
https://www.python.org
Other
63.53k stars 30.44k forks source link

Argparse considers unknown optional arguments with spaces as a known positional argument #66623

Open 8d93642b-220e-4155-8796-ceda76e3bd70 opened 10 years ago

8d93642b-220e-4155-8796-ceda76e3bd70 commented 10 years ago
BPO 22433
Nosy @blueyed
PRs
  • python/cpython#20924
  • Files
  • argparse.py.patch: Bugfix for argparse 1.1
  • argparse_bug_example.py: Bug demonstration
  • example.py
  • patch_1.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['extension-modules', 'type-bug', '3.10'] title = 'Argparse considers unknown optional arguments with spaces as a known positional argument' updated_at = user = 'https://bugs.python.org/DenKoren' ``` bugs.python.org fields: ```python activity = actor = 'terry.reedy' assignee = 'none' closed = False closed_date = None closer = None components = ['Extension Modules'] creation = creator = 'DenKoren' dependencies = [] files = ['36641', '36643', '36674', '36704'] hgrepos = [] issue_num = 22433 keywords = ['patch'] message_count = 6.0 messages = ['227007', '227009', '227193', '227261', '227310', '227403'] nosy_count = 4.0 nosy_names = ['bethard', 'blueyed', 'paul.j3', 'DenKoren'] pr_nums = ['20924'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue22433' versions = ['Python 3.10'] ```

    8d93642b-220e-4155-8796-ceda76e3bd70 commented 10 years ago

    Argparse version 1.1 consider ANY unknown argument string containig ' ' (space character) as positional argument. As a result it can use such unknown optional argument as a value of known positional argument.

    Demonstration code:

    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--known-optional-arg", "-k", action="store_true")
    parser.add_argument("known_positional", action="store", type=str)
    parser.parse_known_args(["--known-optional-arg", "--unknown-optional-arg=with spaces", "known positional arg"])
    parser.parse_known_args(["--known-optional-arg", "--unknown-optional-arg=without_spaces", "known positional arg"])

    Bugfix is attached to issue and affects ArgumentParser._parse_optional() method body.

    Sorry, if it is a better way to report (or, possibly, fix) a bug than this place. It is my first python bug report.

    Thanks.

    8d93642b-220e-4155-8796-ceda76e3bd70 commented 10 years ago

    I've signed 'Contributor's Agreement'.

    Bug demonstration code listed in issue description is in file attached to comment.

    It can be run with following command: python argparse_bug_example.py

    7a064fe6-c535-4d80-a11f-a04ed39056c5 commented 10 years ago

    Your patch fails:

    python3 -m unittest test_argparse.TestEmptyAndSpaceContainingArguments

    specifically these 2 subcases:

       (['-a badger'], NS(x='-a badger', y=None)),
       (['-y', '-a badger'], NS(x=None, y='-a badger')),

    --------------------

    At issue is the ambiguity when the user gives you a string that looks like an optionals flag, but doesn't match one of the defined arguments. When should it be treated as a positional, and when should it be treated as an unknown?

    The code, and test says - if it has the space, treat it like a positional. You are trying to reverse that choice - if it has the prefix, treat it like an unknown optional. At the point where you apply the patch, we already know that the string has a prefixchar. So you are, effectively, eliminating the 'space' test.

    I've added a simpler example of where the presence of the space flips that choice.

    8d93642b-220e-4155-8796-ceda76e3bd70 commented 10 years ago

    There is an standard way to solve this ambiguity.

    There is a special marker '--' used to force argument parsing function treat all arguments given in command after this marker as positional arguments. It was invented specially for tasks where you need to force argument parsing engine ignore option indicator and treat argument as positional.

    I know this argument as standard 'de facto', but can't find any documents about '--' interpretation. But you can see, that Linux 'libc' standard C library function getopt() knows about this flag: http://linux.die.net/man/3/getopt So, any utility using this function knows about '--'.

    For example: grep, tee, cat, tail, head, vim, less, rm, rmdir, common shells (sh, bash, csh, dash, ...) and so on. The list is large. Almost any utility except special ones like 'echo' which main function is to print any argument given to utility.

    So, I think, that the true way here is to parse all arguments staring with '-'/'--' and listed before special '--' marker as optional, and treat all of arguments listed after '--' as positional ones and do not try to use any secondary indicators to make a decision: 'optional' or not.

    For example: """ any parameter before '--' special marker starting with '--' is treated as optional # python3 example.py --bar="one two" xxx (Namespace(pos='xxx'), ['--bar=one two'])

    all parameters after '--' special marker are treated as positional # python3 example.py -- --bar="one two" xxx (Namespace(pos='--bar=one two'), ['xxx']) """

    Yes, my patch does not the solution and argparse should be patched another way.

    Where and how can I get unittests for argparse module to check my code modifications before posting them here? I want to try to modify argparse and produce the patch changing argparse behaviour as described above.

    Thanks.

    7a064fe6-c535-4d80-a11f-a04ed39056c5 commented 10 years ago

    Proposed patches like this are supposed to be generated against the current development version (3.5...), especially if they are 'enhancements' (as opposed to bugs). But there isn't much of a difference in argparse between 2.7+ and 3.4+ (except one nested yield expression).

    Tests are in 'lib/test/...'

    argparse does implement the '--' syntax. That is, anything after it is understood to be positional, regardless of its prefix characters. But before that, optionals and positionals can be freely mixed (within limits).

    I agree that '--foo=one two' looks a lot more like an unknown optional than the test case 'a badger'. Strings with '=' are tested earlier in _parse_optional.

    A fix that passes test_argparse.py and your example is:

            if '=' in arg_string:
                option_string, explicit_arg = arg_string.split('=', 1)
                if option_string in self._option_string_actions:
                    action = self._option_string_actions[option_string]
                    return action, option_string, explicit_arg
                else: # added for unrecognized
                    return None, option_string, explicit_arg
                    # or return None, arg_string, None

    But the '=' case is also tested in the following line:

        option_tuples = self._get_option_tuples(arg_string)

    The obvious difference is that _get_option_tuples handles abbreviations.

    I wonder if the two can be refined to reduce the duplication, and handler your case as well.

    There are other bug issues dealing with multiple '--', the mixing of optionals and positionals, and controlling whether abbreviations are allowed or not. I don't recall any others dealing with strings that contain '=' or space.

    7a064fe6-c535-4d80-a11f-a04ed39056c5 commented 10 years ago

    I've added a patch with tests that I think handles this case, without messing with existing test cases. It adds a new 'space' test right before the existing one, one that explicitly handles an '=' arg_string.

    If the space is in the 1st part, it is marked as a positional (as it does in the existing code). But if the space is in the argument portion has no such effect; the arg_string will be handled as an unknown optional.

            if '=' in arg_string:
                option_prefix, explicit_arg = arg_string.split('=', 1)
                if ' ' in option_prefix:
                    return None
                else:
                    return None, arg_string, None

    The new testcase is in the TestParseKnownArgs class (near the end of test_argparse.py), and tests the different ways in which an arg_string can be allocated to a positional or unknown.

    The underlying idea is that elsewhere in '_parse_optional()', an arg_string with '=' is handled just like a two part optional. It first tries an exact match, and then tries an abbreviation match. In that spirit, this space test also focuses on the flag part of the arg_string, not the argument part.

    Much as I like this solution, I still worry about backward compatibility. As discussed in 'http://bugs.python.org/issue9334', 'argparse does not accept options taking arguments beginning with dash (regression from optparse)', compatibility in this _parse_optional() function is a serious issue. There the proposed patch adds a switch to the API. But an API change is itself messy, and not to be taken lightly if it isn't needed.

    This patch has some comments that should be stripped out before it is actually applied. There is a quite a backlog argparse issues, so I don't expect quick action here.

    terryjreedy commented 10 months ago

    I think the following, based on the closed duplicate, and run in IDLE 3.13.013, Win10, demonstrates the issue.

    >>> import argparse
    >>> parser = argparse.ArgumentParser()
    >>> parser.add_argument('args', nargs='*')
    _StoreAction(option_strings=[], dest='args', nargs='*', const=None, default=None, type=None, choices=None, required=True, help=None, metavar=None)
    >>> parser.parse_args(['--wrong-arg="foo bar"'])
    Namespace(args=['--wrong-arg="foo bar"'])
    >>> parser.parse_args(['--wrong-arg="foobar"'])
    usage: [-h] [args ...]
    : error: unrecognized arguments: --wrong-arg="foobar"
    dbarnett commented 2 months ago

    Hi there, ran into this issue in 2024, found this existing issue for it, but seems like progress stalled out in 2014 (a decade ago... wow!).

    Is there general agreement that space is the wrong heuristic for determining what could be a conditional arg? Could anyone summarize status here and current options?