Alternatives to `NamedTuples` for `Options`

Throughout flint we have been using NamedTuples to as a basis to create immutable structures that are used as interfaces between tasks and to provide some level of first order validation. Throughout the code these are referred to as Options, e.g. WSCleanOptions, GainCalOptions etc.

Though this has been very nice and really helped to force some careful thinking when changing their state, there as some use cases where they are a little limiting and there might be nicer alternatives.

1 - Often the Options should be exposed to the CLI so that a user / tester can supply options for testing or bespoke operations. In the current form these options would have to manually be added to the parser.add_argument. There are modules out there that can operate on Dataclasses or pydantic models.

2 - Inability to add methods to the NamedTuple which we use to sub-class from. Since Options is immutable by design a .with_options method is often attached to each of the Options classes that provides a way of updating specific attributes. So far I have not found a nice / consistent way of being able to attach additional method like .with_options to the NamedTuple class so we don't have to keep repeating the same method.

I am hoping to consider using something from the standard library, so am looking towards dataclasses. These do have a a frozen and kw_only arguments, which allows the output class to be immutable and init'd via keyword arguments only. There are some dataclass to argparse modules as well that might make life easier.

Does anyone have thoughts on this?

Maybe something like this

class BaseOptions:
    def test_method(self, *args, **kwargs):
        print(f"{self=} {args=}" )

    def with_options(self, *args, **kwargs):
        assert len(args) == 0, "Positions args are not allowed"
        assert all([k in self.__dict__ for k in kwargs])
        self.__dict__.update(kwargs)

        return type(self)(**self.__dict__)

@dataclass(frozen=True)
class Options(BaseOptions):
    a: int

a = Options(a=11)
a.test_method('a,', 'b', 2345)

a.with_options(a="something completely different")
a.test_method('a,', 'b', 2345)
a.a = "ERROR" # this will error out

Doing something similar with pydantic

from pydantic import BaseModel
from typing import Union

class OptionsModel(BaseModel):
    model_config = dict(frozen= True)

    def with_options(self, *args, **kwargs):
        assert len(args) == 0, "Positions args are not allowed"
        assert all([k in self.__dict__ for k in kwargs])
        copy_dict = self.__dict__.copy()
        copy_dict.update(kwargs)

        return self.__class__(**copy_dict)

class TestOptions(OptionsModel):

    a: Union[int, float]
    b: float = 1.23

aa = TestOptions(a=1.234)
print(aa)

bb = aa.with_options(a=4)
bb.a = 1 # error

I think I like this approach a little more than dataclasses. Although pydantic is not in the stdlib, it is already a dependency through prefect. The neater thing with this approach is that we can set the frozen=True property to the model_config property of the base model class, which is brought forward to the itels we subclass.

The additional validation and casting it offers based on the types is also neat.

Next will look at how well either approach integrates with:

pulling options from the strategy yaml file, and
how easy it is to integrat into argparse argument parsers

So continuing down the pydantic avenue, here is a neat-ish way to build upon an existing argument parser, add arguments drawn from the model, and recreate the model

from argparse import ArgumentParser
from pathlib import Path
from pydantic import BaseModel, ConfigDict

class BaseOptions(BaseModel):
    model_config = ConfigDict(frozen=True, use_attribute_docstrings=True)

class WSCleanOptions(BaseOptions):
    ms: Path
    """The is the path to the measurement set"""
    imsize: int = 6000
    """The size of an image"""
    make_big: bool = False
    """Make the image larger"""

def add_pydantic_model_to_parser(parser: ArgumentParser, options_class) -> ArgumentParser:

    for name, field in options_class.model_fields.items():
        field_name = name.replace('_', '-')
        field_name = f'--{field_name}' if not field.is_required() else field_name

        field_default = field.default
        action = 'store'
        if field.annotation is bool:
            action = 'store_false' if field.default else 'store_true'

        parser.add_argument(
            field_name, help=field.description, action=action, default=field_default
        )

    return parser

def create_options_from_parser(parser_namespace, options_to_init):
    args = vars(parser_namespace) if not isinstance(parser_namespace, dict) else parser_namespace

    opts_dict = {}
    for name, field in options_to_init.model_fields.items():
        opts_dict[name] = args[name]

    return options_to_init(**opts_dict)

if __name__ == "__main__":
    parser = ArgumentParser(description="Example CLI with a pydantic model")

    parser.add_argument("--something-else", default=123, type=float, help="Unrelated to options")

    parser = add_pydantic_model_to_parser(parser=parser, options_class=WSCleanOptions)
    args = parser.parse_args()

    print(args)

    b = create_options_from_parser(parser_namespace=args, options_to_init=WSCleanOptions)
    print(b)

Running it produces the following with something like python test_pydatnic.py example --make-big

Namespace(something_else=123, ms='example', imsize=6000, make_big=True)
ms=PosixPath('example') imsize=6000 make_big=True

I am not really sure how hacky this is. My use case is intended to be to define other arguments, extract whatever options are required for some aribtary model, and bam. If the model is updated so is the CLI.

tjgalvin / flint

Alternatives to `NamedTuples` for `Options` #181