lebrice / SimpleParsing

Simple, Elegant, Typed Argument Parsing with argparse
MIT License
384 stars 46 forks source link

Parsing a List of dataclasses #271

Open JonasFrey96 opened 1 year ago

JonasFrey96 commented 1 year ago

Hello,

it would be great if we could add support to pass a List of dataclasses:

@dataclass
class ExperimentParams(Serializable):
        @dataclass
        class LayerParams:
            name: str
            loss_scale: float = 1.0

        target_layers: List[any] = field(
            default_factory=lambda LayerParams=LayerParams: [
                LayerParams( name="loss_a", loss_scale=1.0),
                LayerParams(name="loss_b", loss_scale=2.0),
            ]
        )

Is there any way to elegantly parse such a configuration file:

python train.py --target_layers_0.loss_scale 10

This relates to:

                raise NotImplementedError(
                    f"Field {field.name} is of type {field_type}, which isn't "
                    f"supported yet. (container of a dataclass type)"
                )

If someone can point me toward what to change, I am happy to contribute.

lebrice commented 1 year ago

Hello again @JonasFrey96 !

There is limited support for something similar, but not exactly the same:

https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_example.py

In particular, this example here might be of interest to you: https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_lists_example.py

Let me know what you think!

JonasFrey96 commented 1 year ago

In my specific use case, it would be great to allow for parsing a list of data classes. This would help a lot. I tried to look into the code, but it is quite overwhelming and seems to be none trivial.

lebrice commented 12 months ago

Hey @JonasFrey96 , did you look at this example? https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_lists_example.py

MiguelMonteiro commented 11 months ago

Hi, I think this feature would be very useful for things like lists of data augmentations in deep learning code. Is it planned to be implemented?

hugobb commented 11 months ago

Hi, I'm also interested in this feature. From my understanding the idea of the feature, would be to enable this type of hierarchical dataclass definition with List.

@dataclass
class CNNStack:
    name: str = "stack"
    num_layers: int = 3
    kernel_sizes: Tuple[int, int, int] = (7, 5, 5)
    num_filters: List[int] = field(default_factory=[32, 64, 64].copy)

@dataclass
class Config:
    stack_list: List[CNNStack] = field(default_factory=[CNNStack(), CNNStack(), CNNStack()].copy)

parser = ArgumentParser(conflict_resolution=ConflictResolution.ALWAYS_MERGE)
parser.add_arguments(Config, dest="config", default=Config())
args = parser.parse_args()
print(args)

However this is not supported yet, this only works if we manually construct the parser which in my use cases is not very convenient. I would also be interested in helping with this, but also not sure where to start.

hugobb commented 11 months ago

I think the challenging part and the main difference with the multiple_lists_example.py is that in the proposed feature the number of CNNStack would not be fixed but could vary depending on the number of arguments provided by user.

lebrice commented 11 months ago

Hey there @hugobb , @MiguelMonteiro , thanks for commenting.

I agree with you, this feature would be great to have.

@hugobb you're right: this would be challenging to implement, especially if we don't know how many dataclasses to parse in advance!

I guess a first step could be to make this work when we know the number of classes in advance, for instance something like cnn_blocks: tuple[CnnBlock, CnnBlock] . (As far as I recall, this isnt already supported, but I might be wrong, it's been a while since I've worked on this)

Then, as a next step, we could probably figure out the number of values by inspecting the parsed values during postprocessing, or by adding an argument that can be used explicitly, like --cnn_blocks=2 or similar.

This feature will probably require a substantial refactoring of the current "reuse" feature, which is quite messy:

Here are some design-y questions to start us off:

Thanks again @hugobb @MiguelMonteiro @JonasFrey96 for posting. Lets keep this ball rolling.

MiguelMonteiro commented 11 months ago

Hi @lebrice, thanks for your answer.

what differences would there be (if any) with the current reuse feature with respect to how the values for each dataclass are passed from the command-line?

I am not sure how this feature works, sorry

How would you expect to specify a different value for one of the fields of, say, the last dataclass in the list?

Perhaps the most intuitive way is for the parser to automatically append ".1", ".2", etc... to the end of the name. For example: --cnn_stack.1.num_layers=2 --cnn_stack.2.num_layers=5

how should the number of dataclasses to parse be determined? Should it be explicitly passed via an argument, or should it be inferred from the passed values? Will this always be possible? What should we do if/when it isn't?

Ideally, it would be infered from the arguemnts passed from the command line.

What I am still unsure is if it would ever be possible to handle lists with different dataclasses in them. This would be very useful to configure data-augmentations for example.