Feature Request: init-only arguments

HeinrichAD commented 1 year ago

Thanks for the great project 👍

As far as I can see, the TypedArgs class uses typing.dataclass_transform, which seems to work differently than dataclasses.dataclass.

I would like to propose to add the feature to use InitVar (init-only field/attribute).

It would be great to be able to be able to do something like this:

class Args(tap.TypedArgs):
    a: int
    b: int = tap.arg(default=42, help="help b")
    c: InitVar[int] = tap.arg(default=4711, help="help c")  #  init-only field/attribute
    e: int = tap.arg(default=42, help="help e")

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        print("init:", kwargs)  # init: {'a': 5, 'b': 42, 'c': 4711, 'e': 42}
        self.d: int = self.a + self.b + kwargs["c"]  # new attribute
        self.e: str = str(self.e)  # change attribute type

A quick and dirty version (for testing) could be something like this

`typed_argparse/type_utils.py` ```diff import sys +from dataclasses import InitVar from enum import Enum [...] class TypeAnnotation: def __init__(self, raw_type: RawTypeAnnotation): - self.raw_type: RawTypeAnnotation = raw_type + self.raw_type: RawTypeAnnotation = raw_type.type if isinstance(raw_type, InitVar) else raw_type self.origin = _get_origin(raw_type) self.args = _get_args(raw_type) ``` After these little changes we are able to work with the following code snippet which is unfortunately not working perfect. ```python #!/usr/bin/env python3 from dataclasses import dataclass, InitVar import typed_argparse as tap @dataclass class Args(tap.TypedArgs): a: int b: int = tap.arg(default=42, help="help b") c: InitVar[int] = tap.arg(default=4711, help="help c") # init-only field/attribute e: int = tap.arg(default=42, help="help e") def __init__(self, **kwargs): super().__init__(**kwargs) print("init:", kwargs) # ✔ init: {'a': 5, 'b': 42, 'c': 4711, 'e': 42} self.d: int = self.a + self.b + kwargs["c"] # new attribute self.e: str = str(self.e) # change attribute type def run_test(args: Args) -> None: print("Running test:") print(args) # ✘ Args(a=5, b=42, e='42') # ^^^^^^^^^^^^^^^^^ d=4758 should be present print(args.d) # ✔ 4758 (5 + 42 + 4711) print(type(args.d)) # ✔ int print(type(args.e)) # ✔ str print(args.__dict__) # ✘ {'a': 5, 'b': 42, 'c': 4711, 'e': '42', 'd': 4758} # ^^^^^^^^^^ this should not be present here if __name__ == "__main__": tap.Parser(Args).bind(run_test).run(["--a", "5"]) ``` The problems are: ```python print(args) # ✘ Args(a=5, b=42, e='42') # ^^^^^^^^^^^^^^^^^ d=4758 should be present print(args.__dict__) # ✘ {'a': 5, 'b': 42, 'c': 4711, 'e': '42', 'd': 4758} # ^^^^^^^^^^ this should not be present here ``` But at least the IntelliSense/code completion is working (setup: vscode+mypy): - `c` is not shown - `d` is visible - `e` has the correct type ![vscode IDE example](https://github.com/typed-argparse/typed-argparse/assets/5962361/73acd19e-45bc-4d4d-aa5d-72f85adb61c9)

bluenote10 commented 1 year ago

Thanks for the idea!

Actually I'd like to switch from a parsing __init__ constructor (i.e., one which gets an argparse.Namespace instance) to a non-parsing / fully typed __init__. This has the benefit that the classes can be used like regular dataclass like classes, which simply forward all its arguments in the __init__ (if the __init__ is type safe there is no need for validation there thanks to type checking). I need to revisit the dataclass transform specs and how all this could play together.

Note that using conflicting type annotations like is done for e is probably not such a good idea. Does the type annotation now basically lie about the actual type?

HeinrichAD commented 1 year ago

Actually I'd like to switch from a parsing __init__ constructor (i.e., one which gets an argparse.Namespace instance) to a non-parsing / fully typed __init__. This has the benefit that the classes can be used like regular dataclass like classes, which simply forward all its arguments in the __init__ (if the __init__ is type safe there is no need for validation there thanks to type checking). I need to revisit the dataclass transform specs and how all this could play together.

In general, that's a good idea. But I don't think that this modification/remodeling will be easy. If I look at PEP 557 – Data Classes and PEP 681 – Data Class Transforms they just seem to be different. I should also look a bit deeper into both of them as well as alternatives like e.g. attrs.

Note that using conflicting type annotations like is done for e is probably not such a good idea.

I totally agree but that was the only way I found to handle more complex data types.

A simple example would be an Enum which may could be handled along with its choices if tap.arg would be extended to pass on these arguments to the parser.add_argument call.

A more complex and not so easy case would be if multiple arguments build a single object argument/attribute/field. Example:

class Dataset:
    def __init__(self, batch_size: int, seed: Optional[int] = None:
        ...

class Args(tap.TypedArgs):
    batch_size: int
    seed: Optional[int] = tap.arg(default=None)
    dataset: str  # here it would be great to use an enum or even a dict instead

The desired result at the end would natually be that dataset is a instance of Dataset, created by using the other arguments as arguments. But how?

Does the type annotation now basically lie about the actual type?

I don't think so. If you look at the example I provided in the original post (inside the <details>...</details> block), python confirms that on runtime the type is str. And if you look at the output from print("init:", kwargs) (before converting e into a string) you see that it is really a number.

HeinrichAD commented 1 year ago

I've looked a little deeper into the mentioned issues, and maybe this isn't the direction you want, but maybe it will help anyway. Note: These examples are based on dataclasses.dataclass and are not yet linked to typed-argparse.

`init` type check

First there is a create post from Mathias Ettinger on Stack Overflow: https://stackoverflow.com/a/50622643. Here he shows, among other things, a decorator which enforces type hints at runtime. This could easy be extended to also support dataclasses.InitVar[…] types.

_A adapted version of the @enforce_types code can be found inside the "Complex example" below._

This decorator makes it easy to write dataclasses with forced type checking:

@enforce_types
@dataclass
class Args:
    name: str
    age: int = 42

argument field flexibility

It would be great to be able to manipulate or further specify the argparse arguments. This could enable the possibility to be able to e.g. set choises or a long and short version (e.g. -d and --dataset) of the argument itself. dataclasses.field has an additional attribute metadata which can be used freely.

Example

A simple enum would be an illustrative example.

import typing
from dataclasses import dataclass, field
from enum import Enum

class Datasets(Enum):
    MNIST = 1
    CIFAR10 = 2

    def __str__(self):
        return self.name

@dataclass
class Args:
    batch_size: int = 512
    seed: typing.Optional[int] = field(default=42)
    _dataset: InitVar[str] = field(default="MNIST", metadata=dict(args=["-d", "--dataset"], choices=list(Datasets)))
    dataset: Datasets = field(init=False)

    def __post_init__(self, _dataset: str):
        self.dataset: Datasets = Datasets[_dataset]

Or maybe even the type convertion could be supported:

@dataclass
class Args:
    batch_size: int = field(default=512)
    seed: typing.Optional[int] = field(default=42)
    dataset: Datasets = field(default=Datasets.MNIST, metadata=dict(
        args=["-d", "--dataset"], type=lambda ds: Datasets[ds], choices=list(Datasets)
    ))

Edge cases

There are some edge case which also should be considered:

usage of boolean and the 100 ways to write them
- --flag and --no-flag Possible solution: argparse.BooleanOptionalAction
- true, false, 1, 0, t, f, yes, no, ... Possible solution: https://stackoverflow.com/a/43357954

usage of a none-able type with a default value (which is not None) e.g. seed: Optional[int] = field(default=42) How can a user set seed to None in this case? Maybe this could be solved with a custom type converter like this one:

 def noneable(type):
     def check(v):
         if isinstance(v, str) and v.lower() == "none":
             return None
         return type(v)

     return check

 @dataclass
 class Args:
     seed: Optional[int] = field(default=42, metadata=dict(type=noneable(int)))

Complex example

This is a somewhat more complex (but runnable) example, which in my opinion already covers many cases.

code ...

```python #!/usr/bin/env python3 # -*- coding: utf-8 -*- from dataclasses import dataclass, field, InitVar from enum import Enum import logging import inspect import typing from contextlib import suppress from functools import wraps ####################################################################################################################### # Adapted enforce_types code # ############################## def enforce_types(callable): """ Decorator that enforces type hints at runtime. Source: https://stackoverflow.com/a/50622643 Changes: add dataclasses.InitVar support """ spec = inspect.getfullargspec(callable) def check_types(*args, **kwargs): parameters = dict(zip(spec.args, args)) parameters.update(kwargs) for name, value in parameters.items(): with suppress(KeyError): # Assume un-annotated parameters can be any type type_hint = spec.annotations[name] if isinstance(type_hint, typing._SpecialForm): # No check for typing.Any, typing.Union, typing.ClassVar (without parameters) continue try: actual_type = type_hint.__origin__ except AttributeError: # In case of non-typing types (such as , for instance) actual_type = type_hint # In Python 3.8 one would replace the try/except with # actual_type = typing.get_origin(type_hint) or type_hint if isinstance(actual_type, typing._SpecialForm): # case of typing.Union[…] or typing.ClassVar[…] actual_type = type_hint.__args__ if isinstance(actual_type, InitVar): # case of dataclasses.InitVar[…] actual_type = actual_type.type if not isinstance(value, actual_type): raise TypeError("Unexpected type for '{}' (expected {} but found {})".format(name, type_hint, type(value))) def decorate(func): @wraps(func) def wrapper(*args, **kwargs): check_types(*args, **kwargs) return func(*args, **kwargs) return wrapper if inspect.isclass(callable): callable.__init__ = decorate(callable.__init__) return callable return decorate(callable) ####################################################################################################################### # COMPLEX EXAMPLE # ################### class Dataset: def __init__(self, batch_size: int, seed: typing.Optional[int] = None): self.batch_size = batch_size self.seed = seed class MNIST(Dataset): pass class CIFAR10(Dataset): pass class Datasets(Enum): MNIST = MNIST CIFAR10 = CIFAR10 def __str__(self): return self.name @enforce_types @dataclass(kw_only=True) class Args: batch_size: int = field(default=512) seed: typing.Optional[int] = field(default=42) _dataset: InitVar[str] = field(default="MNIST", metadata=dict(args=["--dataset"], choices=list(Datasets))) # or even more convenient, so that __post_init__ already gets an enum object: # _dataset: InitVar[Datasets] = field(default=Datasets.MNIST, metadata=dict( # args=["--dataset"], type=lambda ds: Datasets[ds], choices=list(Datasets) # )) dataset: Dataset = field(init=False) log_name: InitVar[str] = field(default="args.test") logger: logging.Logger = field(init=False) def __post_init__(self, _dataset: str, log_name: str): self.dataset: Dataset = Datasets[_dataset].value(self.batch_size, self.seed) self.logger = logging.getLogger(log_name) if __name__ == "__main__": a = Args(batch_size=128, seed=None, _dataset="CIFAR10", log_name="hello.world") print(a) print(type(a.dataset)) dataset: Dataset = a.dataset ``` #### Output ``` Args(batch_size=128, seed=None, dataset=<__main__.CIFAR10 object at 0x7f387c47ddd0>, logger=) ``` Note: There is also no issue if I let mypy check this code.

bluenote10 commented 1 year ago

In general, that's a good idea. But I don't think that this modification/remodeling will be easy.

In fact, that is already implemented: In combination with setting kw_only_default=True these two lines are doing the job:

https://github.com/typed-argparse/typed-argparse/blob/0b672c0c26c39aea5c3d8f40c3b2c2cdc7bb046b/typed_argparse/typed_args.py#L27-L28

And this backwards compatibility will be removed soon, to really stick to the data class transform convention:

https://github.com/typed-argparse/typed-argparse/blob/0b672c0c26c39aea5c3d8f40c3b2c2cdc7bb046b/typed_argparse/typed_args.py#L23-L25

The construction internally always goes over the from_argparse method, not the backwards compatibility hack in the constructor.

In general I don't really understand your motivating example. typed-argparse already has direct support for Enum, so you could use the Datasets type directly, without having any InitVar work-arounds.

You may have noticed that one of the motivations to develop typed-argparse has been to establish a clean separation of concern between argument parsing and business logic. From that perspective an InitVar does not fit nicely. Basically the type that inherits from TypedArgs should express all arguments that the CLI has. Its purpose is to express exactly that. The purpose should not be stretched too much, i.e., I don't think it is particularly clean to bake very non-trivial transformation of the arguments into the class itself (because then the class does not represent the actual arguments any more). For anything non-trivial I would rather suggest to use composition to transform from the actual args to a modified representation. For instance, your initial example could be something like that:

# This class represents the actual arguments of the app
class ArgsRaw(tap.TypedArgs):
    a: int
    b: int = tap.arg(default=42, help="help b")
    c: tap.arg(default=4711, help="help c") 
    e: int = tap.arg(default=42, help="help e")

# Wrapping it allows to do any kind of non-trivial transformation
class Args:
    def __init__(self, args_raw: ArgsRaw):
        self.a = args_raw.a
        self.b = args_raw.b
        self.d = args_raw.a + args_raw.b + args_raw.c
        self.e = str(args_raw.e)  # change attribute type

This is fully type safe and has clear separation of concerns. Note that for rather "lightweight derived fields" using e.g. a property on the original args type can be perfectly fine. So for instance, if it was only about exposing d as a sum of the other args, you could simply do that:

# This class represents the actual arguments of the app
class ArgsRaw(tap.TypedArgs):
    a: int
    b: int = tap.arg(default=42, help="help b")
    c: tap.arg(default=4711, help="help c") 

    @property
    def d(self) -> int:
        return self.a + self.b + self.c

I'd only go for a wrapper if the transformation is highly non-trivial.

I still think what you are doing here with annotating the e as int but assigning it a str is broken. Code like this:

class Test:
    x: int

    def __init__(self):
        self.x: str = "a string"

Does neither type check in mypy (assuming setting check untyped defs or strict) nor is pyright satisfied with it.

which makes sense, right? Otherwise we wouldn't use the type system to avoid bugs, but rather the type system would force us into bugs, if we purposefully give a variable a different type at runtime than what it is annotated with for type checking.

HeinrichAD commented 1 year ago

Thanks for the detailed answer. I think your design decisions are correct!

You may be right that my use case is somewhat special. I will have to think about using a composition which I wanted to avoid. But this should only a special case. Most cases should be covered with normal attributes, properties and class methods.

With this I also got an answer on my original question/feature request. Therefore, I will close this issue.

Personally, I feel the need to rethink some of my special cases and look for a solution that I'm happy with.

I still think that the edge cases I described previously, in particular the second one, are still valid. If I usage of a none-able type with a default value which is not None, e.g. seed: Optional[int] = field(default=42), how can I set it to None over the cli? Maybe it is just not possible and this has to be set manually each time.

Again thank you for your time and answers as well as for the reminder of the KISS principle.

bluenote10 commented 1 year ago

One possibility to model this "explicitly enforce a defaulted field back to none" would be:

from typing import Optional

import typed_argparse as tap

class Args(tap.TypedArgs):
    seed: int = tap.arg(
        "--seed",
        default=42,
        help="Allows to specify a seed value. You can use `--no-seed` to explicitly disable a seed.",
    )
    no_seed: bool = tap.arg(
        "--no-seed",
        help="Explicitly disables a seed.",
    )

    @property
    def seed_opt(self) -> Optional[int]:
        return None if self.no_seed else self.seed

def run(args: Args):
    print("args:", args)
    print("optional seed:", args.seed_opt)

if __name__ == "__main__":
    tap.Parser(Args).bind(run).run()

The help text would become:

usage: test.py [-h] [--seed SEED] [--no-seed]

options:
  -h, --help   show this help message and exit
  --seed SEED  Allows to specify a seed value. You can use `--no-seed` to explicitly disable a seed. [default: 42]
  --no-seed    Explicitly disables a seed.

And usage would look like this:

The following would be an alternative, but it doesn't type check yet, because I seem to be missing an appropriate overload to support it. I.e. for now you'd need a # type: ignore, but I would hope it could be fully supported by incorporating it into the overloads.

from typing import Optional

import typed_argparse as tap

def _parse_seed(s: str) -> Optional[int]:
    if s.lower() == "none":
        return None
    else:
        return int(s)

class Args(tap.TypedArgs):
    seed: Optional[int] = tap.arg(
        "--seed",
        type=_parse_seed,
        default=42,
        help="Allows to specify a seed value. Specify `none` to explicitly disable a seed.",
    )

def run(args: Args):
    print("args:", args)

if __name__ == "__main__":
    tap.Parser(Args).bind(run).run()

typed-argparse / typed-argparse