lebrice / SimpleParsing

Simple, Elegant, Typed Argument Parsing with argparse
MIT License
410 stars 52 forks source link

`nargs` default for optional regular fields #189

Open pvandyken opened 1 year ago

pvandyken commented 1 year ago

Describe the bug Not exactly a bug, more of a behavioural question:

For optional arguments such as in the reproduction section below, the default behaviour is to allow passing --arg without any parameters, i.e. nargs='?'. This differs from argparse, where the default is to require exactly one param, i.e. nargs=None.

I noticed this exact issue mentioned in the source and tend to agree with the comment that retaining the default argparse behaviour is more intuitive. Just switching the line there to _arg_options['nargs'] = None led to the error below, however, so it's obviously more complicated than it seems.

I was just wondering if you had any plans yet to switch the default, or if the current behaviour is here to stay.

Traceback (most recent call last):
...
  File "/.../lib/python3.9/argparse.py", line 1825, in parse_args
    args, argv = self.parse_known_args(args, namespace)
  File "/.../lib/python3.9/site-packages/simple_parsing/parsing.py", line 324, in parse_known_args
    self._preprocessing()
  File "/.../lib/python3.9/site-packages/simple_parsing/parsing.py", line 550, in _preprocessing
    wrapper.add_arguments(parser=self)
  File "/.../lib/python3.9/site-packages/simple_parsing/wrappers/dataclass_wrapper.py", line 156, in add_arguments
    group.add_argument(*wrapped_field.option_strings, **options)
  File "/.../lib/python3.9/argparse.py", line 1423, in add_argument
    action = action_class(**kwargs)
  File "/.../lib/python3.9/argparse.py", line 922, in __init__
    raise ValueError('nargs must be %r to supply const' % OPTIONAL)
ValueError: nargs must be '?' to supply const

To Reproduce

from simple_parsing import ArgumentParser
from dataclasses import dataclass

@dataclass
class Foo:
   bar: str | None = None

if __name__ == "__main__":
   parser = ArgumentParser()
   parser.add_arguments(Foo, "foo")
   args = parser.parse_args(["--bar"])
   foo: Foo = args.foo
   print(foo)

Expected behavior A clear and concise description of what you expected to happen.

$ python issue.py --bar
usage: issue.py [--bar str]
issue.py: error: argument --bar expected one argument

Actual behavior A clear and concise description of what is happening.

$ python issue.py --bar
Foo(bar=None)

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

lebrice commented 1 year ago

Hey there @pvandyken , thanks for posting this. Sorry I didn't get to it, I was not checking GitHub during the holidays.

Yes, you're right, good catch. This doesn't really make sense. What should nargs be set to in this case?

I'd be super happy to help you if you want to make a small PR for this. Otherwise I can also take a crack at it eventually. The main thing would be to change this nargs to self.nargs I think. Something like that anyway. And then to add a small test that checks that this error is raised, with that error message.

# somewhere in the test folder, probably in test_optional.py
import pytest
from simple_parsing import ArgumentParser
from dataclasses import dataclass
from .testutils import TestSetup, exits_and_writes_to_stdout # (or whatever it's called)

@dataclass
class Foo(TestSetup):
   bar: str | None = None

def test_passing_no_value_raises_error():
    with exits_and_writes_to_stdout("error: argument --bar expected one argument"):
        _ = Foo.setup("--bar")

Then there might be some strings in the tests that need tweaking, since the nargs has an effect on the output of the --help command. But apart from that, I don't think I've been depending on this weird behaviour anywhere (passing a flag without a value). I'd have to double-check though, just to make sure this doesn't introduce too many backward-incompatible changes.

Thanks again for posting this, let me know what you think!

pvandyken commented 1 year ago

No worries about the delay, haha, I was also on holiday.

What should nargs be set to in this case?

I would say it should be set to the argparse default of None, in which case the number of args is based on the action. For the default (store), it should accept exactly one arg.

Happy to take a look when I have time, although that may not be for a bit.

lebrice commented 1 year ago

Ok I'm not quite sure that this is an issue anymore. Let me try to explain. If a field has a default value, then it is necessarily an optional argument, in the sense that you're able to not specify the argument from the command-line. The Optional[T] or T | None annotations indicate that the value (passed from the command-line) can either be of type T or be None. How would you expect to pass the value of None from the command-line? If the annotation is Optional[str], then it might become tricky to differentiate between the empty string "", a None value, and a "None" when you receive --bar=None, no?

This is therefore why I chose that passing no value would correspond to passing None.

Does that make sense? What do you think? Are there other alternatives?

pvandyken commented 1 year ago

None would be the default if the cli param is not passed at all. E.g the three alternatives are:

1. mycli
2. mycli --arg
3. mycli --arg param

The default behaviour of argparse is to capture 1 and 3. If 1, the value will be None (or whatever the default is).

(At least, this is my understanding, I'm not actually playing with it at the moment)

lebrice commented 1 year ago

:thinking: This makes sense. I'll keep thinking about this one for a bit, I'll get back to you.

lebrice commented 1 year ago

What if the field is of type Optional[str] and has a default value? How would you then specify that the value should actually be None, from the command-line?

pvandyken commented 1 year ago

Hmm, interesting.

This is getting a bit complicated, so I went ahead and wrote up mappings of all the possibilities.

First, I'll restate my goal here as getting a clear, priority syntax for:

parser.add_argument("--myarg", default=None) # fully: parser.add_argument("--myarg", action="store", default=None, nargs=None)

which currently, afaik, does not exist.

This is the current situation:

class CurrentSituation:
    optionaldefault: Optional[str] = "some string"
    """
    # parser.add_argument("--optionaldefault", action="store", nargs="?", default="some string")
    cli                                        -> optionaldefault = "some string"
    cli --optionaldefault                      -> optionaldefault = None
    cli --optionaldefault "a different string" -> optionaldefault = "a different string"
    """

    nonoptional: str = "some string"
    """
    # parser.add_argument("--nonoptional", action="store", nargs=None, default="some string")
    cli                                    -> nonoptional = "some string"
    cli --nonoptional                      -> Error
    cli --nonoptional "a different string" -> nonoptional = "a different string"
    """

    optionalNone: Optional[str] = None
    """
    # parser.add_argument("--optionalNone", action="store", nargs="?", default=None)
    cli                                     -> optionalNone = None
    cli --optionalNone                      -> optionalNone = None
    cli --optionalNone "a different string" -> optionalNone = "a different string"
    """

    optional: Optional[str]
    """
    # parser.add_argument("--optional", action="store", nargs="?", default=None)
    cli                                 -> optional = None
    cli --optional                      -> optional = None
    cli --optional "a different string" -> optional = "a different string"
    """

The last two forms are degenerate, both mapping to the same thing. So what we could do instead is the following:

class Alternative1:
    optionalNone: Optional[str] = None
    """
    # parser.add_argument("--optionalNone", action="store", nargs="?", default=None)
    cli                                     -> optionalNone = None
    cli --optionalNone                      -> optionalNone = None
    cli --optionalNone "a different string" -> optionalNone = "a different string"
    """

    mandatoryoptional: Optional[str]
    """
    # parser.add_argument("--optional", action="store", nargs=None, default=None)
    cli                                 -> optional = None
    cli --optional                      -> Error
    cli --optional "a different string" -> optional = "a different string"
    """

This removes the degeneracy, and makes what I want possible. My concern is that the mapping is really confusing, since optional and nonoptional are equivalent in outcome but opposite in form. It would be great if adding the Optional type simply enabled line two of each example, but we can't have optional: str = None without raising typing errors.

For reference, what I guess I originally proposed is something like this

class ArgparseDefaults:
    optionaldefault: Optional[str] = "some string"
    """
    # parser.add_argument("--optionaldefault", action="store", nargs=None, default="some string")
    cli                                        -> optionaldefault = "some string"
    cli --optionaldefault                      -> Error
    cli --optionaldefault "a different string" -> optionaldefault = "a different string"
    """

    nonoptional: str = "some string"
    """
    # parser.add_argument("--nonoptional", action="store", nargs=None, default="some string")
    cli                                    -> nonoptional = "some string"
    cli --nonoptional                      -> Error
    cli --nonoptional "a different string" -> nonoptional = "a different string"
    """

    optionalNone: Optional[str] = None
    """
    # parser.add_argument("--optionalNone", action="store", nargs=None, const=None)
    cli                                     -> optionalNone = None
    cli --optionalNone                      -> Error
    cli --optionalNone "a different string" -> optionalNone = "a different string"
    """

    optional: Optional[str]
    """
    # parser.add_argument("--optional", action="store", nargs=None, const=None)
    cli                                 -> optional = None
    cli --optional                      -> Error
    cli --optional "a different string" -> optional = "a different string"
    """

This is obviously very degenerate, and makes the Optional annotation under optionaldefault superfluous. The advantage is that it sticks to argparse defaults and makes things explicit: if you want nargs="?" you need to set it manually.

I do see the appeal in allowing the type annotations to set nargs (and it works great for List[...]), but I think achievement of the above goal in a way that's not confusing is more important.

At any rate, unless I'm missing something, this seems to be the entire situation, so it's however you want to move forward from here.

lebrice commented 1 year ago

Thanks for laying all this out @pvandyken. I think I agree with you (might have to re-read it again more carefully). I don't see an answer to my original question though.. If we were to make the optional case behave like you want it to, how would you allow passing a value of None from the command-line?

What if the field is of type Optional[str] and has a default value? How would you then specify that the value should actually be None, from the command-line?

lebrice commented 1 year ago

Oh! Also, I probably should have mentioned this at the start, but you can overwrite the auto-generated argparse arguments by passing them to the simple_parsing.field function. For instance, you can do this:

@dataclass
class Config:
    my_arg: Optional[str] = simple_parsing.field(default=None, action="store", nargs=None)

I believe this should do what you want. You can also customize any of the usual arguments to parser.add_argument by passing them to the field function. This saves them in the field metadata, and those entries overwrite those generated by simple_parsing.

pvandyken commented 1 year ago

If we went with what I originally proposed, you would need to explicitly set it to field("some default", nargs="?"). Otherwise it wouldn't be possible.

I'm not at all married to my original proposal, I hadn't written everything else at that point. It's a way of solving the problem, but it has its own disadvantages, as per the above paragraph. If there's a better way, I'm all for it!