Closed cjerdonek closed 5 years ago
This issue is to ensure that argparse.ArgumentParser() accepts objects that support the "in" operator for the "choices" argument to ArgumentParser.add_argument().
As observed by Terry in the comments to bpo-16418:
http://bugs.python.org/issue16418#msg175520
the argparse module does not in general support "choices" values that support the "in" operator, even though the argparse documentation says it does:
"Any object that supports the in operator can be passed as the choices value, so dict objects, set objects, custom containers, etc. are all supported."
(from http://docs.python.org/2/library/argparse.html#choices )
For example, passing a user-defined type that implements only self.__contains__() yields the following error message when calling ArgumentParser.parse_args():
File ".../Lib/argparse.py", line 1293, in add_argument raise ValueError("length of metavar tuple does not match nargs")
(The error message also gives the wrong reason for failure. The swallowed exception is "TypeError: '\<class>' object is not iterable.")
For the record, choices types implementing only __contains__ never worked in any cases. (I should have said ArgumentParser.add_argument() raises a ValueError in the above.)
So I wonder if we should classify this as an enhancement and simply document the restriction in maintenance releases to iterable types. Clearly the module was written under the assumption (in multiple places) that choices are iterable.
Also, if we do change this, perhaps we should fall back to displaying the metavar in help messages when naming the container rather than using repr(). A message like the following, for example, wouldn't be very helpful or look very good:
invalid choice: 0 (choose from \<main.Container object at 0x10555efb0>)
I think we should avoid letting Python creep into help and usage text.
If the assumption of iterability is in more than just the help and error messages, then I can see the point of calling this an enhancement. I suspect that now, if a custom subset of ints or strings is the actual domain, people just skip choices and add custom checks after argparse processing. I am curious what Steven thinks.
Adding a failing test. I will supply a patch shortly.
Attaching patch. With this patch, passing a non-iterable choices argument to parser.add_argument() raises (for example):
Traceback (most recent call last):
...
File ".../Lib/argparse.py", line 558, in _metavar_formatter
choice_strs = [str(choice) for choice in action.choices]
TypeError: 'MyChoices' object is not iterable
instead of the incorrect:
File ".../Lib/argparse.py", line 1333, in add_argument raise ValueError("length of metavar tuple does not match nargs") ValueError: length of metavar tuple does not match nargs
Is it okay to change this exception type in maintenance releases? The other option is to keep the error as a ValueError but to change the error message, though I think TypeError is the correct exception to allow through. Note that the existing ValueError is preserved for other code paths. Indeed, there are several tests checking for this ValueError and its error message, which the attached patch does not break.
If we want to consider accepting non-iterable choices for 3.4, we can still have that discussion as part of a separate patch.
Slight doc tweak: s/container/sequence/.
Since the line between a type error and a value error is fuzzy anyway, I'd be in favor of maintaining the backward compatibility here. We don't consider exception message content part of the API (though we do occasionally work to preserve it when we know people are depending on it), but the exception *types* generally are.
Sounds fine. Does that mean a test should still be added for the message? I was never clear on this because on the one hand we want to be sure we use the right message (and that we're actually fixing the issue), but on the other hand we don't want the message to be part of the API. By the way, to make things slightly less brittle, I could be clever and trigger a TypeError to get the right message.
I guess another option would be to mark the test CPython-only.
CPython only would not be appropriate, as it is not.
What I usually do in such cases is use AssertRaisesRegex looking for some critical part of the message that represents the functionality we are looking for rather than the exact text. In this case, it would be looking for the type name of the problem value in the message, since that is how we are identifying the specific source of the error.
The exception question is messy, but I think it is the wrong question. The doc is correct in that it says what the code should be doing. To test whether an argument is in a collection of choices, the code should just say that: 'arg in choices' (as far as I know, it does -- for the actual check). In other words, I think the original intent of this issue is correct.
"Clearly the module was written under the assumption (in multiple places) that choices are iterable." I think it should not. We implement 'in' with '__contains__', rather than forcing the use of iteration, for good reason. I discussed some examples in msg175520.
As far as I know, the reason argparse iterates is to bypass the object's representation methods and produce custom, one-size-fits-all, usage and error messages. As discussed in bpo-16418, this supposed user-friendliness may not be. To me, restricting input for this reason is a tail-wags-dog situation. If the object is not iterable, just use repr for the messages instead of exiting. Let the app writer be responsible for making them user-friendly and not absurdly long.
I don't disagree that this feature could be useful. I'm just not sure it should go into a maintenance release. It feels like an enhancement to me because to use this feature, the user will have to use the API in a way they haven't before, and we will probably have to do things like add documentation and examples for this new use case (e.g. explaining that users passing non-iterable choices will need to implement a user-friendly repr() for help to render nicely).
Also, it introduces new questions like: if we're going to be using repr() for that case, then why wouldn't we allow repr() to be used for iterable choices if the user would like to better control the behavior (e.g. for very long lists)? Why not have a unified way to deal with this situation (e.g. something like __argparse_repr__ with a better name, or else provide or document that certain formatters should be used)? These don't seem like bug-fix questions.
As far as I know, the reason argparse iterates is to bypass the object's representation methods and produce custom, one-size-fits-all, usage and error messages.
Another possibility is that it was the most helpful message for the use case the writers originally had in mind. Certainly, for example, seeing the available choices '0, 1, 2, 3, 4' is more useful than seeing 'xrange(5)'.
Note that we would also have to deal with not just the error text but also the usage string. From the docs:
>>> parser.parse_args('11'.split())
usage: PROG [-h] {5,6,7,8,9}
PROG: error: argument foo: invalid choice: 11 (choose from 5, 6, 7, 8, 9)
I took a good look at the 3.3 code. With respect to the main purpose of choices -- checking user input -- argparse does not require that choices be iterable, as it *does* use 'in', as it should. Line 2274:
if action.choices is not None and value not in action.choices:
So unless the usage message is generated even when not needed (I did not delve that far), non-iterables should work now as long as the user does not request the usage message or make an input mistake.
If that is so, then this issue is strictly about the string representation of non-iterable choices. A mis-behaving tail is not a reason to kill the dog ;-). The easy answer, and the only sensible one I see, is to use either str() or repr(). But how to do that? I think this and bpo-16418 have to be fixed together.
The current format-by-iteration method, used for both usage and exceptions, works well for small iterable collections. But it is obnoxious for non-small collections. As I mentioned before, it will just hang for infinite iterables, which is even worse. So the method needs to be changed anyway. And to do that, it should be factored out of the three places where it is currently done in-line.
At 557: elif action.choices is not None: choice_strs = [str(choice) for choice in action.choices] result = '{%s}' % ','.join(choice_strs)
To match the code below, so it can be factored out into a function, change the last two lines to
choices_str = ','.join(str(c) for c in action.choices)
result = '{%s}' % choices_str
At 597: (note that 'params' is adjusted action.__dict__) if params.get('choices') is not None: choices_str = ', '.join([str(c) for c in params['choices']]) params['choices'] = choices_str
The intermediate list in the 2nd line is not needed choices_str = ', '.join(str(c) for c in params['choices'])
I am aware of but do not understand ',' versus ', ' as joiner. I also do not understand why both versions of choices_str are needed. Are there two different usage messages?
At 2276: 'choices': ', '.join(map(repr, action.choices))}
or, replacing map with comprehension choices_str = ', '.join(repr(c) for c in action.choices) 'choices': choices_str}
Now define choices_str(src, joiner, rep), delete 559 and 598, and modify
559: ... result = '{%s}' % choices_str(action.choices, ',', str)
599: ... params['choices'] = choices_str(param['choices'], ', ', str)
2276: ... 'choices': choices_str(action.choices, ', ', repr}
(I am assuming that the two joiners are really needed. If not, delete.)
Now we should be able to write choices_str to solve all 3 problems in the two issues. My coded suggestion:
from itertools import islice
N = 21 # maximum number (+1) of choices for the current nice string.
# This is just an illustrative value, experiment might show better #.
def choices_str(src, joiner, rep):
prefix = list(islice(src, N))
if len(prefix) < N: # short iterables
return joiner.join(rep(c) for c in prefix) # current string
else:
try: # non-short sliceable finite sequences
head = joiner.join(rep(c) for c in src[:N//2])
tail = joiner.join(rep(c) for c in src[N//4:])
return head + ' ..., ' + tail
except AttributeError: # no __getindex__(slice), everything else
return repr(src)
argparse does not require that choices be iterable, as it *does* use 'in', as it should. Line 2274: if action.choices is not None and value not in action.choices:
There are cases where it's incorrect for argparse to being using "in" instead of sequence iteration, which again leads me to think that iteration is what was intended. See bpo-16977.
So unless the usage message is generated even when not needed (I did not delve that far), non-iterables should work now as long as the user does not request the usage message or make an input mistake.
As I said in the second comment of this issue, this doesn't work in any case because the ValueError is raised on add_argument(). So I don't see how the lack of this option can be affecting any users.
_Actions_container(object) [1198 in 3.3.0 code] .addargument() [1281] does not directly check for choices.\_iter ('__iter' is not in the file). Nor does it use the 3.x-only alternative isinstance(choices, collections) ('Iterable' is also not in the file). Rather it formats the help string for each argument as added.
The complete statement that raises the error is (now at 1321): try: self._get_formatter()._format_args(action, None) except TypeError: raise ValueError("length of metavar tuple does not match nargs")
def _get_formatter is part of class ArgumentParser(_AttributeHolder, _ActionsContainer): so 'self' has to be an ArguementParser for the above exception.
_format_args calls get_metavar, which is returned by _metavar_formatter. The last contains the first of the three choices iterations that I propose to factor out and revise. So that proposal should eliminate the particular exception from add_argument.
The docstring for class Action mirrors the doc: '''
Trying to prettily format messages peripheral to the main goal should not take indefinitely or infinitely long, nor make the message worse, nor raise an exception.
I have a new suggestion that I hope will satisfy Terry.
After looking at the code more, I noticed that add_argument() does currently work for non-iterable choices provided that metavar is passed. This was also noted in the report for the duplicate bpo-16697.
On reflection, this makes sense because that's what metavar is for -- providing a replacement string for the usual formatting in the help text. The only thing that doesn't work in this case is error message formatting.
With that, I'd like to propose the following:
(1) Change the add_argument() error raised for non-iterable choices from:
ValueError("length of metavar tuple does not match nargs")
to something like:
ValueError("metavar must be provided for non-iterable choices")
This provides the help string representation for non-iterable choices (in the spirit of "Explicit is better than implicit").
(2) Make the error text the following for non-iterable choices (the error message currently breaks for non-iterable choices):
PROG: error: argument FOO: invalid choice: 'xxx'
compared with (for iterable choices)--
PROG: error: argument FOO: invalid choice: 'xxx' (choose from ...)
I think this is preferable to inserting the str() or repr() (especially for maintenance releases) because str() and repr() may not be meant for displaying to the end-users of a script. The instructions/description of such choices is already in the add_argument() "help" string. We could consider appending that to provide substantive guidance.
Actually, let me relax (1). We can just use what the argparse code calls the "default_metavar" in that case (which it constructs from the argument name).
The important thing for me is not displaying the str/repr when it might not be intended.
If you can somewhat solve the problem by better using the existing api, good. I am not 'stuck' on reusing str/repr*. If metavar is non-optional for non-iterable choices, the doc should say so in the entry for choices. (Does the test suite already have a testcase already for non-iterable choices + metavar?)
If you can solve it even better and remove that limitation by extending the 'default_metaver' idea from positional and optional args to choices ('choiced' args), even better.
I think the refactoring may still be needed, especially for bpo-16418, but that is that issue.
Here is a patch for discussion that allows non-iterable choices with or without metavar. It refactors as suggested. However, the patch does not yet have tests.
Does the test suite already have a testcase already for non-iterable choices + metavar?
No, not that I saw. Also, to answer a previous question, the three places in which the choices string is used are: in the usage string (separator=','), in the help string when expanding "%(choices)s" (separator=', '), and in the error message text (separator=', ' with repr() instead of str()).
I think we have converged on the right solution. The patch looks good as far as it goes, assuming that it passes the current + unwritten new tests. It will also be a good basis for reconsidering what to do with long/infinite iterables in bpo-16418.
I think the doc patch should recommend passing a metavar arg with non-iterable choices. With that, using default metavars, whatever they end up being, is fine as long as no exception is raised. So I would make the tests of non-iterable with no metavar passed as general as possible. App writers who do not like the defaults should override them ;-).
If I understand correctly, if choices is not iterable and the user enters an invalid choice, the choice part of the error message (passed to ArgumentError) will just be 'invalid choice: \<value>'.
Minor nit: .join does not need a temporary list as arg and works fine with genexp: choices_str = sep.join(to_str(choice) for choice in choices)
the choice part of the error message (passed to ArgumentError) will just be 'invalid choice: \<value>'.
That's right. With the patch it looks like this:
>>> p = argparse.ArgumentParser(prog='test.py')
>>> p.add_argument('--foo', choices=c)
>>> p.parse_args(['--foo', 'bad'])
usage: test.py [-h] [--foo FOO]
test.py: error: argument --foo: invalid choice: 'bad'
With that, using default metavars, whatever they end up being
argparse's default metavar is just to capitalize the "dest" if the argument is optional, otherwise it is the dest itself (which is always or usually lower-case):
def _get_default_metavar_for_optional(self, action):
return action.dest.upper()
def _get_default_metavar_for_positional(self, action):
return action.dest
So the patch uses that. You can see the former case in the above usage string. Also, with the patch the current tests pass, btw.
chris.jerdonek wrote: "Also, to answer a previous question, the three places in which the choices string is used are: in the usage string (separator=','), in the help string when expanding "%(choices)s" (separator=', '), and in the error message text (separator=', ' with repr() instead of str())."
In the usage string, the ',' is used to make a compact representation of the choices. The ', ' separator is used in the help line, where space isn't as tight.
This 'choices formatting' is called during 'add_argument()' simply as a side effect of checking for valid parameters, especially 'nargs' (that it, is an integer or an acceptable string). Previously 'nargs' errors were not caught until 'parse_args' was used. This is discussed in
http://bugs.python.org/issue9849 Argparse needs better error handling for nargs
http://bugs.python.org/issue16970 argparse: bad nargs value raises misleading message
On the issue of what error type to raise, my understanding is that 'ArgumentError' is the preferred choice when it affects a particular argument. parse_args() nearly always raises an ArgumentError. Once add_argument has created an action, it too can raise an ArgumentError. ArgumentError provides a standard way of identifying which action is giving the problem.
While testing 'metavar="range(0,20)"', I discovered that the usage formatter strips off parenthesis. A regex expression that removes excess 'mutually exclusive group' notation is responsible for this. The simple fix is to modify the regex so it distinguishes between ' (...)' and 'range(...)'. I intend to create a new issue for this, since it affects any metavar the includes ().
I'd suggest not worrying about the default metavar in the _expand_help() method. The formatted choice string created in that method is only used if the help line includes a '%(choices)s' string. The programmer can easily omit that if he isn't happy with the expanded list.
TestHelpVariableExpansion is the only test in test_argparse.py that uses it.
Sig('--foo', choices=['a', 'b', 'c'],
help='foo %(prog)s %(default)s %(choices)s')
This patch generally deals with the choices option, and specifically the problems with formatting long lists, or objects that have __contains but not __iter. But it also incorporates issues 9849 (better add_argument testing) and 9625 (choices with *). It may be too broad for this issue, but the changes all relate to 'choices'.
As noted by other posters, there are 3 places where choices is formatted with a comprehension. I have refactored these into one _format_choices function.
_formatchoices() is a utility function that formats the choices by iteration, and failing that using repr(). It raises an error if choices does not even have a \_contains__. It also has a summarize option ({1,2,3,...,19,20}). I did not make this an action method because it only uses the choices object.
_metavar_formatter() - calls _format_choices for Usage with the default compact form. Its use of metavar gives the user full control of the choices display.
_expand_help() - calls _format_choices with the looser format. This form is used only if the user puts '%(choices)s' in the help string. This is not documented, and only appears a few times in the test file. Again the user has ultimate control over the contents.
_check_value() - calls _format_choices with a 'summarize=15' option. Normally this error message appears with the usage message. So it does not need to use the metavar.
The MetavarTypeHelpFormatter subclass is an example of how formats can be customized without changing normal behavior. Such a subclass could even be used to set custom parameters, or modify any of the above methods.
-------------------- other changes:
formatter _format_actions_usage() - I tweaked the regex that trims excess notation from mutually exclusive groups. This removed '()' from other parts of the usage line, for example a metavar like 'range(20)'. bpo-18349.
formatter _format_args() - I included bpo-9849 changes which improve testing for nargs, and array metavars. This calls the _metavar_formatter. Thus any errors in formatting choices pass back through this.
bpo-9849 also changes container add_argument() to call the parser _check_argument(). This in turn calls _format_args() to test action options like nargs, metavars, and now choices. If there are problems it raises an ArgumentError.
parser _get_values() - bpo-9625 changes this to correctly handle choices when nargs='*'.
parser _checkvalue() - I rewrote this to give better errors if there are problems with \_contains. If choices is a string (e.g. 'abc') it converts it to a list, so __contains is more consistent. For example, 'bc' in 'abc' is True, but 'bc' in ['a','b','c'] is False (bpo-16977)
---------------------- test_argparse
change examples with string choices to lists
class TestAddArgumentMetavar change EXPECTED_MESSAGE and EXPECTED_ERROR to reflect bpo-9849 changes
class TestMetavarWithParen tests 'range(n)' choices makes sure () in the metavar are preserved tests that metavar is used in Usage as given tests summarized list of choices in the error message tests the %(choices)s help line case
class TestNonIterableChoices tests a choices container that has __contains but not __iter tests that repr() is used as needed
class TestBareChoices tests a class without even __contains__ tests for an add_argument error
class TestStringChoices tests the expansion of 'abc' to ['a','b','c']
I just submitted a patch to http://bugs.python.org/issue11874 which rewrites _format_actions_usage(). It now formats the groups and actions directly, keeping a list of these parts. It no longer has to cleanup or split a usage line into parts. So it is not sensitive to special characters (space, [] or () ) in the choices metavar.
This issue has sat idle for six years. Meanwhile, the docs are still incorrect, giving every user wrong information about how the module works. Can we consider just changing the documentation instead of worrying about what the behavior should be or what the rationale is?
But see
https://bugs.python.org/issue37793
for a discussion of whether 'container' is as good a descriptor as 'iterable'.
I think this should be dropped. IMO it is a pedantic nit. There is the everyday usage of the word "container" which has a reasonable plain meaning. There is also an ABC that was unfortunately called Container (with a capital C) that means anything the defines __contains__. IMO, we don't have to change all uses for the former just because the latter exists.
AFAICT, this "issue" for the argparse has never been a real problem for a real user ever in the entire history of the module.
So, unless there are strong objections, I would like to close this and we can shift our attention to matters of actual importance.
Here is an example of someone who cares about the behavior and/or the documentation (and still cares enough to check up on their StackOverflow question six years later): https://stackoverflow.com/questions/13833566/python-argparse-choices-from-an-infinite-set/13833736 .
The issue is not the use of the word "container". The docs literally say "Any object that supports the in
operator can be passed as the choices value" and that is not true. In fact, changing it to say "any container type" would be an improvement as that is at least arguably correct, whereas the current documentation is unambiguously incorrect.
Okay, that makes sense. Go ahead with the edit.
New changeset 84125fed2a45a9e454d7e870d8bbaf6ece3d41e8 by Raymond Hettinger in branch 'master': bpo-16468: Clarify which objects can be passed to "choices" in argparse (GH-15566) https://github.com/python/cpython/commit/84125fed2a45a9e454d7e870d8bbaf6ece3d41e8
New changeset 0d45d50e421b46b56195821580c3760b43813106 by Raymond Hettinger (Miss Islington (bot)) in branch '3.8': bpo-16468: Clarify which objects can be passed to "choices" in argparse (GH-15566) (GH-15587) https://github.com/python/cpython/commit/0d45d50e421b46b56195821580c3760b43813106
The documentation issue discussed and fixed in the above commit was reintroduced in 8e76d7e1a9592b24717204ad307b7493d593191f, which introduces a quick link/summary table including the description "[...], Any object that supports in
operator".
Minor changes have been made in later commit:
in
operator".All of these descriptions are incorrect for the same reasons discussed in the above thread, and the use of the term "collection/container" should be used less formally, rather then referring to collections.abc.Container or objects that implement the in
operator.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at =
created_at =
labels = ['easy', '3.8', 'type-bug', 'library', '3.9']
title = 'argparse only supports iterable choices'
updated_at =
user = 'https://github.com/cjerdonek'
```
bugs.python.org fields:
```python
activity =
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date =
closer = 'rhettinger'
components = ['Library (Lib)']
creation =
creator = 'chris.jerdonek'
dependencies = []
files = ['28731', '28734', '28735', '28765', '30872']
hgrepos = []
issue_num = 16468
keywords = ['patch', 'newcomer friendly']
message_count = 34.0
messages = ['175534', '175574', '175584', '180007', '180009', '180010', '180021', '180023', '180024', '180032', '180049', '180052', '180053', '180067', '180071', '180075', '180168', '180169', '180186', '180191', '180710', '180720', '192110', '192273', '192714', '193190', '349914', '349915', '349924', '349928', '349930', '349934', '350748', '350753']
nosy_count = 12.0
nosy_names = ['rhettinger', 'terry.reedy', 'bethard', 'r.david.murray', 'chris.jerdonek', 'BrenBarn', 'berker.peksag', 'paul.j3', 'hashimo', 'Radu.Ciorba', 'sebix', 'nanjekyejoannah']
pr_nums = ['15566', '15587']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue16468'
versions = ['Python 3.8', 'Python 3.9']
```