python / cpython

The Python programming language
https://www.python.org
Other
62.75k stars 30.08k forks source link

Argparse `add_argument` treats `-` in flags differently to `-` in positional arguments. #95100

Open CSDUMMI opened 2 years ago

CSDUMMI commented 2 years ago

Bug report

If you add an optional argument to a parser with argparse containing dashes, those are converted to _ automatically in the resulting Namespace object.

But if you add a positional argument containing a -, this is not converted and the resulting error message suggests the argument name containing the - instead of the _. Which is of course not possible (without getattr), because it's not a valid variable name in Python.

This behaviour is misleading and undocumented and I'd suggest to convert - to _ in positional arguments too.

Reproduction code:

import argparse

parser = argparse.ArgumentParser("example")

parser.add_argument("foo-bar", type=str)

args = parser.parse_args()

print("getattr", getattr(args, "foo-bar"))

print("- replaced by _", args.foo_bar)

Results in:

$ python3 main.py abc
getattr aoe
Traceback (most recent call last):
  File "/tmp/main.py", line 11, in <module>
    print("- replaced by _", args.foo_bar)
AttributeError: 'Namespace' object has no attribute 'foo_bar'. Did you mean: 'foo-bar'?

Compounding this issue is the fact, that you are prevented from using the dest option on add_argument to overwrite the name in the Namespace.

Your environment

ericvsmith commented 2 years ago

I’ve been bitten by this before, and I think it would be a nice feature. But wouldn’t it break code currently using getattr?

CSDUMMI commented 2 years ago

You are right.

Is that happening a lot?

In that case, a work-around could be added to make it possible to use getattr with the dash name.

But that's not a very nice, right?

I'd expect that this is really not a lot of work and that removing a work around and replacing it with the convention in existing code should benefit the code quality of any project currently using it.

ericvsmith commented 2 years ago

I guess you could make it available under both names, but I'm not sure it's worth the hassle.

I don't think we should break existing code, even if it would be better off with the change.

CSDUMMI commented 2 years ago

PEP 387 lays out the rules for backward compatibility in Python.

Given a set of arguments, the return value, side effects, and raised exceptions of a function. This does not preclude changes from reasonable bug fixes.

I would consider this a reasonable bug fix, because this behavior is undocumented and diverges from the convention explicity documented for optional arguments in argparse.

ericvsmith commented 2 years ago

In my years working on Python, I've learned that every change will break something.

CSDUMMI commented 2 years ago

In that case, breaking something should not be an argument against changing something.

CSDUMMI commented 2 years ago

I fear that adding a work around for supporting code using getattr and dash names, could lead to other problems.

For example, if you iterate over the resulting namespace, use vars or dir on it, a workaround to allow for both the - and _ version of a positional argument name would lead to the same argument being contained twice within the namespace.

With the work around enabled, this would happen:


iimport argparse

parser = argparse.ArgumentParser("example")

parser.add_argument("foo-bar", type=str)

args = parser.parse_args(["abc"])

print(vars(args)) # { "foo-bar" :  "abc", "foo_bar" : "abc" }

And that is yet another undocumented, unintuitive behavior with potential for breaking existing code.

ericvsmith commented 2 years ago

In that case, breaking something should not be an argument against changing something.

I don’t think that follows.

But my point is the bar for change is high, and I don’t think this meets the criteria.

CSDUMMI commented 2 years ago

I consider it an undocumented, unintuitive and not obvious behavior.

Behavior that is not documented, is also behavior that is not guaranteed.

And I would also wager, that it was not intentional.

Thus this should be considered as a reasonable bug fix.

It must be either fixed or at the very least be documented in the argparse documentation. Not everybody will be familiar with getattr and know that it can be used to work around this bug.

ericvsmith commented 2 years ago

I agree it should at least be documented.

CSDUMMI commented 2 years ago

So:

If your positional argument name contains a -, you must use getattr(args, "foo-bar") instead of args.foo_bar, because this might break backwards compatiblity if fixed.

CSDUMMI commented 2 years ago

Explaining that note in the documentation might be a little hard.

CSDUMMI commented 2 years ago

The problem here is that the longer it is not fixed, the harder it will be to fix it eventually.

And who really wants to have to explain this behavior indefinitely to people new to argparse?

ericvsmith commented 2 years ago

The problem here is that the longer it is not fixed, the harder it will be to fix it eventually.

That's always true for all such changes.

I'm not saying it shouldn't be done. I'm saying we'll probably break working code, and that's a very high bar for a change. I occasionally make changes to argparse, but here I'll wait for other core devs to weigh in. In the meantime, a doc PR would be welcomed.

CSDUMMI commented 2 years ago

I'm having trouble finding the right place to add a warning about this behavior.

CSDUMMI commented 2 years ago

To be more specific:

Should the warning + work around be added as inline documentation of argparse module or as part of Doc/library/argparse.rst?

hpaulj commented 1 year ago

For optionals, there is a POSIX convention of accepting dashes in the flag strings, so conversion to underscore makes sense.

For positionals, there isn't any good reason to use dashes - unless you want to make life difficult for yourself. You are free to use any 'dest' string, even ones that start with numbers and contain odd characters. Internally, argparse uses the dest with setattr/getattr, so it isn't bothered by odd characters.

And if you must have dashes in the usage or help, use the metavar.

During the debugging phase it's a good idea to include a 'print(args)' line, so you aren't surprised by changes, or nonchanges to the 'dest.

In _get_optional_kwargs, the dash replace is done only when it is inferred from the long option string. It is not done when you provide an explicit 'dest' parameter.

For a positional, _get_positional_kwargs gets the first (and only) string as the 'dest'. It does not do any checking or replacement.

I think the 'dest' documentation is clear enough. "For positional argument actions, dest is normally supplied as the first argument ". The dash conversion is clearly identified as an optionals feature.

zackw commented 3 months ago
  1. Yes, there is a good reason to want to use dashes for positional arguments: consistency with optional arguments. It looks bad, unprofessional, like I didn't bother copyediting my documentation, to have both "input_file" and "--output-file" in the --help output.

  2. The dash conversion is not clearly identified as optionals-only. The actual text of the documentation is

    For positional argument actions, dest is normally supplied as the first argument to add_argument(). For optional argument actions, the value of dest is normally inferred from the option strings. [two irrelevant sentences about the inference process]
    Any internal - characters will be converted to _ characters to make sure the string is a valid attribute name.

    I suppose it's possible to read the "Any internal - characters" sentence as part of the process "for optional argument actions", but it is equally possible to read it as applying to all argument actions, and it makes more sense for it to be applied to all arguments, because the rationale -- "to make sure the string is a valid attribute name" -- applies to all arguments.

  3. Aggravating the problem, the documentation, together with the behavior of argparse if you give both a first non-keyword argument and a dest= argument to add_argument for a positional (i.e. throwing an error) makes it seem like it is impossible to manually make the attribute and visible name of a positional be different. It is actually possible, but the only way to do it is by giving add_argument no positional arguments and specifying both dest= and metavar=, which I only realized was a valid thing to do when I found issue #117834. I was actually about to start monkey-patching argparse to work around this issue before I thought to check for bug reports.

  4. I believe backward compatibility can be ensured by having ap.add_argument("input-file", ...) set both input-file and input_file in the namespace object, in the absence of an explicit dest= parameter.

ericvsmith commented 3 months ago

4. I believe backward compatibility can be ensured by having ap.add_argument("input-file", ...) set both input-file and input_file in the namespace object, in the absence of an explicit dest= parameter.

Unless someone is iterating over the namespace and doesn't expect to find the same thing twice.

I'm not familiar with your point # 3. I'll have to read up on it.

hpaulj commented 3 months ago

Setting both 'input-file' and 'input_file' should be thoroughly tested, if done. Nothing else sets two dest (that I can think of). That includes all action subclasses. 'append' for example fetches the existing value. Also setting and checking the defaults (for 'required') could be messsd up. And 'parents'. I haven't looked the proposed changes, but off hand point 4 feels like a minefield of bugs. The unittest cases would have to cover this.

On Thu, Jun 27, 2024, 3:32 PM Eric V. Smith @.***> wrote:

  1. I believe backward compatibility can be ensured by having ap.add_argument("input-file", ...) set both input-file and input_file in the namespace object, in the absence of an explicit dest= parameter.

Unless someone is iterating over the namespace and doesn't expect to find the same thing twice.

I'm not familiar with your point # 3. I'll have to read up on it.

— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/95100#issuecomment-2195772726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAITB6D36OL4OESWNSGWXYLZJSHHRAVCNFSM6AAAAABKAONKSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVG43TENZSGY . You are receiving this because you commented.Message ID: @.***>

hpaulj commented 3 months ago

Normally the dest of a postional is set by the 1st nonkeyword argument. It objects to a keyword dest because it isn't prepared to choose one over the other. Since your end user doesn't use this dest you should choose it based what's convenient for you. The metavar is available if the usage and help needs something else. Special characters like '-' preclude using the dest as an args attribute, but otherwise are ok.

I still prefer only a documentation change. Your end users will never encounter an error or bug with this issue. And you, the programmer, will quickly see any mistaken assumptions with a print(args) during initial debugging.

On Thu, Jun 27, 2024, 4:19 PM paulj @.***> wrote:

Setting both 'input-file' and 'input_file' should be thoroughly tested, if done. Nothing else sets two dest (that I can think of). That includes all action subclasses. 'append' for example fetches the existing value. Also setting and checking the defaults (for 'required') could be messsd up. And 'parents'. I haven't looked the proposed changes, but off hand point 4 feels like a minefield of bugs. The unittest cases would have to cover this.

On Thu, Jun 27, 2024, 3:32 PM Eric V. Smith @.***> wrote:

  1. I believe backward compatibility can be ensured by having ap.add_argument("input-file", ...) set both input-file and input_file in the namespace object, in the absence of an explicit dest= parameter.

Unless someone is iterating over the namespace and doesn't expect to find the same thing twice.

I'm not familiar with your point # 3. I'll have to read up on it.

— Reply to this email directly, view it on GitHub https://github.com/python/cpython/issues/95100#issuecomment-2195772726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAITB6D36OL4OESWNSGWXYLZJSHHRAVCNFSM6AAAAABKAONKSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJVG43TENZSGY . You are receiving this because you commented.Message ID: @.***>

zackw commented 3 months ago

To me it seems obvious that if dest is specified explicitly then it should win over anything inferred from the names list, whether or not the argument is an option.

Since .add_argument("this-thing", dest="this_thing") currently throws an exception there is no backward compatibility cost to permitting it. I acknowledge that my point #‌4 may not work out in practice; I hadn't thought of iterating over the namespace.

rindeal commented 3 months ago

IMO key normalization on access makes much more sense than storing duplicated values

--- a/Lib/argparse.py
+++ b/Lib/argparse.py
@@ -35,8 +35,32 @@ class Namespace(_AttributeHolder):
         return vars(self) == vars(other)

     def __contains__(self, key):
-        return key in self.__dict__
+        return self._get_normalized_key(key) is not None
+
+    def __getattr__(self, name):
+        normalized_key = self._get_normalized_key(name)
+        if normalized_key is not None:
+            return self.__dict__[normalized_key]
+        raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
+
+    def __getitem__(self, key):
+        return self.__getattr__(key)
+
+    def _get_normalized_key(self, key):
+        if key in self.__dict__:
+            return key
+        alt_key = key.replace('-', '_') if '-' in key else key.replace('_', '-')
+        return alt_key if alt_key in self.__dict__ else None
serhiy-storchaka commented 1 week ago

This is a duplicate of an older issue #59330, which also has patches and PR. All proposed solutions have flaws and drawbacks.