mlcommons / GaNDLF

A generalizable application framework for segmentation, regression, and classification using PyTorch
https://gandlf.org
Apache License 2.0
154 stars 78 forks source link

Improved configuration parsing #758

Open sarthakpati opened 8 months ago

sarthakpati commented 8 months ago

Is your feature request related to a problem? Please describe. Currently, the configuration parsing is done by a single submodule [ref], which is creates issues with maintenance as more functionality is introduced.

Describe the solution you'd like Using a configuration validation mechanism that is more standardized would make the developer and user experience much better. Some examples are (and this is by no means an exhaustive list):

Package Documentation Example
Cerberus https://docs.python-cerberus.org/ https://stackoverflow.com/a/46626418
PyDantic https://docs.pydantic.dev/latest/ https://stackoverflow.com/a/61021183
Marshmallow https://marshmallow.readthedocs.io/en/stable/ https://stackoverflow.com/a/63739747

Describe alternatives you've considered N.A.

Additional context It would make sense to eventually move GaNDLF's various functionalities to something like Hydra, and this could potentially be a good starting point.

vedik2002 commented 8 months ago

Hi @sarthakpati can you please assign this issue to me.

sarthakpati commented 8 months ago

Thanks, done.

github-actions[bot] commented 6 months ago

Stale issue message

github-actions[bot] commented 4 months ago

Stale issue message

sarthakpati commented 3 months ago

Hi @vedik2002, are you still working on this?

github-actions[bot] commented 1 month ago

Stale issue message

sarthakpati commented 2 weeks ago

Hey @vedik2002 are you still working on this?

vedik2002 commented 2 weeks ago

Hi @sarthakpati can please remove me from this issue.

sarthakpati commented 2 weeks ago

Here are a few viable alternatives to our current configuration management system:

  1. Data Classes:

    • dataclasses module [ref] provides a decorator and functions for automatically adding special methods to user-defined classes. This can make your code cleaner and more efficient when dealing with a large number of parameters.
      
      from dataclasses import dataclass

    @dataclass class Parameters: param1: int param2: float param3: str

  2. Named Tuples:

    • collections.namedtuple module [ref] can be used to create tuple subclasses with named fields. This can be faster than dictionaries for certain operations.
      
      from collections import namedtuple

    Parameters = namedtuple('Parameters', ['param1', 'param2', 'param3']) params = Parameters(param1=1, param2=2.0, param3='three')

Did you have something else in mind, @VukW?

VukW commented 1 week ago

I'd prefer to stick to pydantic models (BaseModel, BaseSettings) as they actually give the same experience and simplicity as dataclass but with additional handy tools like custom fields validation, runtime type checking, etc

sarthakpati commented 1 week ago

Do you mean these?

VukW commented 1 week ago

@sarthakpati yes, exactly

sarthakpati commented 1 week ago

Cool, thanks. 👍🏽