Open JonasFrey96 opened 1 year ago
Hello again @JonasFrey96 !
There is limited support for something similar, but not exactly the same:
https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_example.py
In particular, this example here might be of interest to you: https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_lists_example.py
Let me know what you think!
In my specific use case, it would be great to allow for parsing a list of data classes. This would help a lot. I tried to look into the code, but it is quite overwhelming and seems to be none trivial.
Hey @JonasFrey96 , did you look at this example? https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_lists_example.py
Hi, I think this feature would be very useful for things like lists of data augmentations in deep learning code. Is it planned to be implemented?
Hi, I'm also interested in this feature. From my understanding the idea of the feature, would be to enable this type of hierarchical dataclass definition with List.
@dataclass
class CNNStack:
name: str = "stack"
num_layers: int = 3
kernel_sizes: Tuple[int, int, int] = (7, 5, 5)
num_filters: List[int] = field(default_factory=[32, 64, 64].copy)
@dataclass
class Config:
stack_list: List[CNNStack] = field(default_factory=[CNNStack(), CNNStack(), CNNStack()].copy)
parser = ArgumentParser(conflict_resolution=ConflictResolution.ALWAYS_MERGE)
parser.add_arguments(Config, dest="config", default=Config())
args = parser.parse_args()
print(args)
However this is not supported yet, this only works if we manually construct the parser which in my use cases is not very convenient. I would also be interested in helping with this, but also not sure where to start.
I think the challenging part and the main difference with the multiple_lists_example.py is that in the proposed feature the number of CNNStack would not be fixed but could vary depending on the number of arguments provided by user.
Hey there @hugobb , @MiguelMonteiro , thanks for commenting.
I agree with you, this feature would be great to have.
@hugobb you're right: this would be challenging to implement, especially if we don't know how many dataclasses to parse in advance!
I guess a first step could be to make this work when we know the number of classes in advance, for instance something like cnn_blocks: tuple[CnnBlock, CnnBlock]
. (As far as I recall, this isnt already supported, but I might be wrong, it's been a while since I've worked on this)
Then, as a next step, we could probably figure out the number of values by inspecting the parsed values during postprocessing, or by adding an argument that can be used explicitly, like --cnn_blocks=2
or similar.
This feature will probably require a substantial refactoring of the current "reuse" feature, which is quite messy:
the parsing functions would need to share the information of how many values are expected (IIRC this is already partly the case, but I'd like to make this more explicit)
I could potentially invest some time in order to try to get this to work, but I'd like to get a bit more info if possible before that.
Here are some design-y questions to start us off:
Thanks again @hugobb @MiguelMonteiro @JonasFrey96 for posting. Lets keep this ball rolling.
Hi @lebrice, thanks for your answer.
what differences would there be (if any) with the current reuse feature with respect to how the values for each dataclass are passed from the command-line?
I am not sure how this feature works, sorry
How would you expect to specify a different value for one of the fields of, say, the last dataclass in the list?
Perhaps the most intuitive way is for the parser to automatically append ".1", ".2", etc... to the end of the name. For example: --cnn_stack.1.num_layers=2 --cnn_stack.2.num_layers=5
how should the number of dataclasses to parse be determined? Should it be explicitly passed via an argument, or should it be inferred from the passed values? Will this always be possible? What should we do if/when it isn't?
Ideally, it would be infered from the arguemnts passed from the command line.
What I am still unsure is if it would ever be possible to handle lists with different dataclasses in them. This would be very useful to configure data-augmentations for example.
Hello,
it would be great if we could add support to pass a List of dataclasses:
Is there any way to elegantly parse such a configuration file:
This relates to:
If someone can point me toward what to change, I am happy to contribute.