lebrice / SimpleParsing

Simple, Elegant, Typed Argument Parsing with argparse
MIT License
401 stars 50 forks source link

'set' type seems to not be properly supported #95

Open nacitar opened 2 years ago

nacitar commented 2 years ago

Describe the bug The 'set' type does not seem to be properly supported.

To Reproduce Reproduction code: test.py

#!/usr/bin/env python3
from dataclasses import dataclass
from simple_parsing import ArgumentParser

parser = ArgumentParser()

@dataclass
class Options:
    set_things: set[str]
    list_things: list[str]

parser.add_arguments(Options, dest="options")

args = parser.parse_args()
print("options:", args.options)

Reproduction invocation:

$ ./test.py --list_things a b a --set_things a b a

Expected behavior

$ ./test.py --list_things a b a --set_things a b a
options: Options(set_things={'a', 'b'}, list_things=['a', 'b', 'a'])`

Actual behavior

$ ./test.py --list_things a b a --set_things a b a
usage: test.py [-h] --set_things set --list_things list
test.py: error: unrecognized arguments: b a

Desktop (please complete the following information):

Additional context If you only pass in a single value to --set_things, it properly creates a set from it.

I can't imagine that supporting this would be hard given the current list() support... and while I could simply take a list and create a set from it, that feels completely unnecessary.

lebrice commented 2 years ago

Interesting, thanks @nacitar for pointing this out! Not 100% sure, but this might also have something to do with python 3.9. I'll add some tests for it, but first, can you actually get a set argument using just add_argument from argparse? All I really need to add support is to have the mapping from the dataclasses.Field object to the arguments to pass to add_argument.

nacitar commented 2 years ago

There's going to be more to it here factoring in set[int] vs set[str] and such but that's not something you'll need help with from me. I do, however, think I know the pieces of the the solution you're missing to tie it all together.

I'm not an expert on argparse's internals, but I did look at its code a bit to answer this. The way argparse wants things to work is you give 'type' to add_argument to say the ultimate end value type, rather than any container around it (e.g., for list[int] you just give int), and then you specify that you want this to be a list via nargs, and then internally it accumulates all of the values provided into a list accordingly. At a glance that's where the story ends with respect to arguments that take multiple values; you just get a list.

However, I don't fully understand any nuances as I've only just determined this approach when trying to answer your question, but Action classes seem to be the solution here:

#!/usr/bin/env python3
# argtest.py
import argparse

class customAction(argparse.Action):
    def __call__(self, parser, args, values, option_string=None):
        # the call to set() here is the magic
        setattr(args, self.dest, set(values))

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('-e', '--example', action=customAction, nargs='*')
args = parser.parse_args()

print(repr(args.example))

Testing shows that this works as expected:

$ ./argtest.py -e
set()
$ ./argtest.py -e 1
{'1'}
$ ./argtest.py -e 1 2
{'2', '1'}
$ ./argtest.py -e 1 2 3
{'2', '3', '1'}
$ ./argtest.py -e 1 2 3 1 2 2 2 2 3 3 4
{'2', '1', '4', '3'}

The iteration order of the entries in the set will vary each time it's ran because of PYTHONHASHSEED but that's expected.

Hopefully that helps.

lebrice commented 2 years ago

In a way, it does help, but this basically confirms what I suspected: There is no way to use purely just the type, action, nargs etc from add_argument to get set fields. We'd basically need to just add a "postprocessing" step of converting the parsed values for set fields from lists to sets (inside the ArgumentParser._postprocessing method). Using a custom action would also work, but I try not to use those, since IIRC we'd also need to (re)implement the metavar, nargs, etc behaviour from argparse.

This should be pretty easy to add, since there's also already a is_set function in utils.py, so that could be used. I can't quite do it myself at this very moment. I'll mark this as a good first issue, and if anyone is interested in contributing, please ask any question below and I'll be more than happy to give you some pointers.