What's the best way to create a parser from a Pydantic model?

kddubey commented 9 months ago

I already have a Pydantic model which specifies argument names, default values, types, and descriptions. What's the best way to create a subclass of Tap which uses these Pydantic fields?

My current solution might be kinda hacky. It took some trial and error. Lmk what you think.

from pydantic import BaseModel, Field
from tap import Tap

def _tap_from_pydantic_model(model: type[BaseModel]) -> type[Tap]:
    class ArgParser(Tap):
        def configure(self):
            for name, field in model.model_fields.items():
                self._annotations[name] = field.annotation
                self.class_variables[name] = {"comment": field.description or ""}
                # The help string will be nicely constructed in _add_argument
                if field.is_required():
                    kwargs = {}
                else:
                    kwargs = dict(required=False, default=field.default)
                self.add_argument(f"--{name}", **kwargs)

    return ArgParser

Here's a demo

```python # demo.py from pydantic import BaseModel, Field from tap import Tap def _tap_from_pydantic_model(model: type[BaseModel]) -> type[Tap]: class ArgParser(Tap): def configure(self): for name, field in model.model_fields.items(): self._annotations[name] = field.annotation self.class_variables[name] = {"comment": field.description or ""} # The help string will be nicely constructed in _add_argument if field.is_required(): kwargs = {} else: kwargs = dict(required=False, default=field.default) self.add_argument(f"--{name}", **kwargs) return ArgParser class Model(BaseModel): """ My Pydantic Model which contains script args. """ arg_str: str = Field(description="hello") arg_bool: bool = Field(default=True, description=None) arg_list: list[str] | None = Field(default=None, description="optional list") def main(model: Model) -> None: print("Parsed args into Model:") print(model) if __name__ == "__main__": ModelTap = _tap_from_pydantic_model(Model) args = ModelTap(description="Script description").parse_args() model = Model(**args.as_dict()) main(model) ``` Help message is nice: ```bash $ python demo.py -h usage: demo.py --arg_str ARG_STR [--arg_bool] [--arg_list [ARG_LIST ...]] [-h] Script description options: --arg_str ARG_STR (str, required) hello --arg_bool (bool, default=True) --arg_list [ARG_LIST ...] (list[str] | None, default=None) optional list -h, --help show this help message and exit ``` Running it— ```bash python demo.py \ --arg_str test \ --arg_list x y z \ --arg_bool ``` —outputs: ``` Parsed args into Model: arg_str='test' arg_bool=False arg_list=['x', 'y', 'z'] ```

An alternate solution like this one—where kwargs are explicitly provided to .add_argument—(by design) ends up bypassing all of the niceties in ._add_argument, e.g., type unboxing, constructing a typed help string.

Question

Would you consider supporting a Tap.from_pydantic_model init method? It seems desirable.

Regardless, thank you for this awesome package :-)

kddubey commented 9 months ago

My module here contains an implementation which works for a:

Pydantic BaseModel (class or instance)
builtin dataclass (class or instance)
Pydantic dataclass (class or instance).

Lmk if you're open to a PR for a Tap.from_data_model initialization method

martinjm97 commented 8 months ago

Hi @kddubey,

Wow! This is so cool! Thank you for all the work you put in to integrating with Pydantic. Having both typed argument parsing and data validation looks like a big win for typed Python!

We'd absolutely love a PR on this!

We see your code is already well documented. A PR with this level of complexity would probably benefit from significant testing. We're happy to support you in making this happen!

--JK

kddubey commented 8 months ago

I was skimming through the docs and learned that this functionality is pretty similar to tapify.

Differences:

tapify is slightly more generic b/c it inspects the input's signature. tap_class_from_data_model only works for builtin dataclasses, Pydantic BaseModels, and Pydantic dataclasses
tapify doesn't show the field's description in the -h help message. Again, this is b/c tapify inspects the input's signature, while tap_class_from_data_model grabs field info from the data model
tap_class_from_data_model returns a Tap class. tapify instead initializes a Tap instance, and calls parse_args() and .as_dict() for you (returning an instance of the input if it was a class, or running it if the input is a function), which is usually all you need. The advantage to returning a class is that you can add more arguments or special behavior by overriding the configure and process_args methods

I'll open a PR with what I have and you can decide how or whether to merge some of tap_class_from_data_model into tapify. I'm thinking about a refactor which standardizes how argument data is pulled from an object.

swansonk14 / typed-argument-parser

What's the best way to create a parser from a Pydantic model? #125

Question