Open CSDUMMI opened 2 years ago
I’ve been bitten by this before, and I think it would be a nice feature. But wouldn’t it break code currently using getattr
?
You are right.
Is that happening a lot?
In that case, a work-around could be added to make it possible to use getattr
with the dash name.
But that's not a very nice, right?
I'd expect that this is really not a lot of work and that removing a work around and replacing it with the convention in existing code should benefit the code quality of any project currently using it.
I guess you could make it available under both names, but I'm not sure it's worth the hassle.
I don't think we should break existing code, even if it would be better off with the change.
PEP 387 lays out the rules for backward compatibility in Python.
Given a set of arguments, the return value, side effects, and raised exceptions of a function. This does not preclude changes from reasonable bug fixes.
I would consider this a reasonable bug fix, because this behavior is undocumented and diverges from the convention explicity documented for optional arguments in argparse
.
In my years working on Python, I've learned that every change will break something.
In that case, breaking something should not be an argument against changing something.
I fear that adding a work around for supporting code using getattr
and dash names, could lead to other problems.
For example, if you iterate over the resulting namespace, use vars
or dir
on it, a workaround to allow for both the - and _ version of a positional argument name would lead to the same argument being contained twice within the namespace.
With the work around enabled, this would happen:
iimport argparse
parser = argparse.ArgumentParser("example")
parser.add_argument("foo-bar", type=str)
args = parser.parse_args(["abc"])
print(vars(args)) # { "foo-bar" : "abc", "foo_bar" : "abc" }
And that is yet another undocumented, unintuitive behavior with potential for breaking existing code.
In that case, breaking something should not be an argument against changing something.
I don’t think that follows.
But my point is the bar for change is high, and I don’t think this meets the criteria.
I consider it an undocumented, unintuitive and not obvious behavior.
Behavior that is not documented, is also behavior that is not guaranteed.
And I would also wager, that it was not intentional.
Thus this should be considered as a reasonable bug fix.
It must be either fixed or at the very least be documented in the argparse
documentation. Not everybody will be familiar with getattr
and know that it can be used to work around this bug.
I agree it should at least be documented.
So:
If your positional argument name contains a -, you must use
getattr(args, "foo-bar")
instead ofargs.foo_bar
, because this might break backwards compatiblity if fixed.
Explaining that note in the documentation might be a little hard.
The problem here is that the longer it is not fixed, the harder it will be to fix it eventually.
And who really wants to have to explain this behavior indefinitely to people new to argparse
?
The problem here is that the longer it is not fixed, the harder it will be to fix it eventually.
That's always true for all such changes.
I'm not saying it shouldn't be done. I'm saying we'll probably break working code, and that's a very high bar for a change. I occasionally make changes to argparse, but here I'll wait for other core devs to weigh in. In the meantime, a doc PR would be welcomed.
I'm having trouble finding the right place to add a warning about this behavior.
To be more specific:
Should the warning + work around be added as inline documentation of argparse module or as part of Doc/library/argparse.rst?
For optionals, there is a POSIX convention of accepting dashes in the flag strings, so conversion to underscore makes sense.
For positionals, there isn't any good reason to use dashes - unless you want to make life difficult for yourself. You are free to use any 'dest' string, even ones that start with numbers and contain odd characters. Internally, argparse uses the dest with setattr/getattr, so it isn't bothered by odd characters.
And if you must have dashes in the usage or help, use the metavar.
During the debugging phase it's a good idea to include a 'print(args)' line, so you aren't surprised by changes, or nonchanges to the 'dest.
In _get_optional_kwargs, the dash replace is done only when it is inferred from the long option string. It is not done when you provide an explicit 'dest' parameter.
For a positional, _get_positional_kwargs gets the first (and only) string as the 'dest'. It does not do any checking or replacement.
I think the 'dest' documentation is clear enough. "For positional argument actions, dest is normally supplied as the first argument ". The dash conversion is clearly identified as an optionals feature.
Yes, there is a good reason to want to use dashes for positional arguments: consistency with optional arguments. It looks bad, unprofessional, like I didn't bother copyediting my documentation, to have both "input_file
" and "--output-file
" in the --help output.
The dash conversion is not clearly identified as optionals-only. The actual text of the documentation is
For positional argument actions, dest is normally supplied as the first argument to
add_argument()
. For optional argument actions, the value of dest is normally inferred from the option strings. [two irrelevant sentences about the inference process]
Any internal-
characters will be converted to_
characters to make sure the string is a valid attribute name.
I suppose it's possible to read the "Any internal -
characters" sentence as part of the process "for optional argument actions", but it is equally possible to read it as applying to all argument actions, and it makes more sense for it to be applied to all arguments, because the rationale -- "to make sure the string is a valid attribute name" -- applies to all arguments.
Aggravating the problem, the documentation, together with the behavior of argparse if you give both a first non-keyword argument and a dest=
argument to add_argument
for a positional (i.e. throwing an error) makes it seem like it is impossible to manually make the attribute and visible name of a positional be different. It is actually possible, but the only way to do it is by giving add_argument
no positional arguments and specifying both dest=
and metavar=
, which I only realized was a valid thing to do when I found issue #117834. I was actually about to start monkey-patching argparse to work around this issue before I thought to check for bug reports.
I believe backward compatibility can be ensured by having ap.add_argument("input-file", ...)
set both input-file
and input_file
in the namespace object, in the absence of an explicit dest= parameter.
4. I believe backward compatibility can be ensured by having
ap.add_argument("input-file", ...)
set bothinput-file
andinput_file
in the namespace object, in the absence of an explicit dest= parameter.
Unless someone is iterating over the namespace and doesn't expect to find the same thing twice.
I'm not familiar with your point # 3. I'll have to read up on it.
Setting both 'input-file' and 'input_file' should be thoroughly tested, if
done. Nothing else sets two dest
(that I can think of). That includes
all action
subclasses. 'append' for example fetches the existing value.
Also setting and checking the defaults (for 'required') could be messsd
up. And 'parents'. I haven't looked the proposed changes, but off hand
point 4 feels like a minefield of bugs. The unittest cases would have to
cover this.
On Thu, Jun 27, 2024, 3:32 PM Eric V. Smith @.***> wrote:
- I believe backward compatibility can be ensured by having ap.add_argument("input-file", ...) set both input-file and input_file in the namespace object, in the absence of an explicit dest= parameter.
Unless someone is iterating over the namespace and doesn't expect to find the same thing twice.
I'm not familiar with your point # 3. I'll have to read up on it.
— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/95100#issuecomment-2195772726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAITB6D36OL4OESWNSGWXYLZJSHHRAVCNFSM6AAAAABKAONKSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVG43TENZSGY . You are receiving this because you commented.Message ID: @.***>
Normally the dest
of a postional
is set by the 1st nonkeyword
argument. It objects to a keyword dest
because it isn't prepared to
choose one over the other. Since your end user doesn't use this dest
you
should choose it based what's convenient for you. The metavar is available
if the usage and help needs something else. Special characters like '-'
preclude using the dest
as an args
attribute, but otherwise are ok.
I still prefer only a documentation change. Your end users will never
encounter an error or bug with this issue. And you, the programmer, will
quickly see any mistaken assumptions with a print(args)
during initial
debugging.
On Thu, Jun 27, 2024, 4:19 PM paulj @.***> wrote:
Setting both 'input-file' and 'input_file' should be thoroughly tested, if done. Nothing else sets two
dest
(that I can think of). That includes allaction
subclasses. 'append' for example fetches the existing value. Also setting and checking the defaults (for 'required') could be messsd up. And 'parents'. I haven't looked the proposed changes, but off hand point 4 feels like a minefield of bugs. The unittest cases would have to cover this.On Thu, Jun 27, 2024, 3:32 PM Eric V. Smith @.***> wrote:
- I believe backward compatibility can be ensured by having ap.add_argument("input-file", ...) set both input-file and input_file in the namespace object, in the absence of an explicit dest= parameter.
Unless someone is iterating over the namespace and doesn't expect to find the same thing twice.
I'm not familiar with your point # 3. I'll have to read up on it.
— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/95100#issuecomment-2195772726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAITB6D36OL4OESWNSGWXYLZJSHHRAVCNFSM6AAAAABKAONKSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVG43TENZSGY . You are receiving this because you commented.Message ID: @.***>
To me it seems obvious that if dest
is specified explicitly then it should win over anything inferred from the names list, whether or not the argument is an option.
Since .add_argument("this-thing", dest="this_thing")
currently throws an exception there is no backward compatibility cost to permitting it. I acknowledge that my point #4 may not work out in practice; I hadn't thought of iterating over the namespace.
IMO key normalization on access makes much more sense than storing duplicated values
--- a/Lib/argparse.py
+++ b/Lib/argparse.py
@@ -35,8 +35,32 @@ class Namespace(_AttributeHolder):
return vars(self) == vars(other)
def __contains__(self, key):
- return key in self.__dict__
+ return self._get_normalized_key(key) is not None
+
+ def __getattr__(self, name):
+ normalized_key = self._get_normalized_key(name)
+ if normalized_key is not None:
+ return self.__dict__[normalized_key]
+ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
+
+ def __getitem__(self, key):
+ return self.__getattr__(key)
+
+ def _get_normalized_key(self, key):
+ if key in self.__dict__:
+ return key
+ alt_key = key.replace('-', '_') if '-' in key else key.replace('_', '-')
+ return alt_key if alt_key in self.__dict__ else None
This is a duplicate of an older issue #59330, which also has patches and PR. All proposed solutions have flaws and drawbacks.
Bug report
If you add an optional argument to a parser with
argparse
containing dashes, those are converted to_
automatically in the resulting Namespace object.But if you add a positional argument containing a
-
, this is not converted and the resulting error message suggests the argument name containing the-
instead of the_
. Which is of course not possible (withoutgetattr
), because it's not a valid variable name in Python.This behaviour is misleading and undocumented and I'd suggest to convert
-
to_
in positional arguments too.Reproduction code:
Results in:
Compounding this issue is the fact, that you are prevented from using the
dest
option onadd_argument
to overwrite the name in the Namespace.Your environment