`to_tap_class`, and inspect fields instead of signature for data models

kddubey commented 8 months ago

This is a WIP following up on #125. It should also close #112.

2 advantages to using tap_class_from_data_model instead of tapify:

Gets field descriptions for the -h help string
Returns a Tap class, which allows the user to override configure and process_args. See the example here.

Future refactor

I'm thinking we can allow tapify to return a Tap class instead of calling the input object.

from tap_class_from_data_model import _tap_class  # in this PR

def tapify(class_or_function, return_tap_class: bool=False, ...):
    fields_data = ... 
    # from inspecting the signature of class_or_function (as already done in tapify),
    # or by extracting field data from class_or_function if it's a data model (to get
    # the field descriptions)
    tap_class = _tap_class(fields_data)
    if return_tap_class:
        return tap_class
    # rest of tapify as before
    tap = tap_class(description=description, explicit_bool=explicit_bool)
    command_line_args = tap.parse_args()
    command_line_args_dict = command_line_args.as_dict()
    return class_or_function(**command_line_args_dict)

@martinjm97 What do you think of this refactor? Lmk if I should open a new issue for this, or a PR if that makes things more concrete.

For now, my plan is to focus on the todos above. The downside to merging in tap_class_from_data_model is that there would be 2 interfaces—tapify and tap_class_from_data_model—which "tapify" a Pydantic model or dataclass, when there could be 1: tapify.

martinjm97 commented 8 months ago

Hi @kddubey,

The PR looks great so far! Thank you for all the work you put in!

@swansonk14 and I will meet to discuss this PR within a week.

My opinion is that both of the advantages that you listed of tap_class_from_data_model over tapify should be incorporated into tapify. As a result, I'm inclined to put the new support for Pydantic in tapify as well.

We'll keep you posted. Thank you again!

--Jesse

martinjm97 commented 8 months ago

Hi @kddubey,

We are in support of two different functions convert_to_tap_class and tapify. Then convert_to_tap_class (a renaming of tap_class_from_data_model) produces a Tap class from a class, dataclass, function, or pydantic model and tapify would call convert_to_tap under the hood and then either instantiate the class or run the function.

Here's a version that we were working on https://github.com/swansonk14/typed-argument-parser/blob/convert_to_tap/tap/tapify.py, but we haven't had time to complete it. Any more contributions in this direction would be greatly appreciated! Thank you for what you have done already!

We'll revisit this PR in the next few weeks and find a way to get this into main!

Best, JK (Jesse and Kyle)

kddubey commented 7 months ago

@martinjm97 thank you for the feedback about convert_to_tap_class, and the typing fixes from a few days ago. I updated the PR. Current tests should be passing.

Remaining todos for me before I un-draft this PR and ask for a full review:

[x] don't require pydantic
[x] test on pydantic dataclasses and BaseModels in tests/test_tapify.py
[x] test to_tap_class (rn it's just demo'd)
[x] get this to work for pydantic v1
[x] add to README

martinjm97 commented 7 months ago

Thank you so much again! Wow!

kddubey commented 7 months ago

The PR is ready for review @martinjm97

(Vocab note: "data model" refers to builtin dataclasses, Pydantic models, and Pydantic dataclasses. If something is a data model, it means it has "fields" which can be publicly inspected to extract argument data like the type annotation, default value, and description.)

Summary of changes:

tap/tapify.py: for data models, look at the field info instead of the signature. The only benefit is that field descriptions can be used as argument descriptions, which improves -h. It might also be more canonical / robust to inspect fields instead of the signature for data models. If you'd like to keep the current behavior—inspect the signature, not fields—change _tap_data to just return _tap_data_from_class_or_function(...).
tap/tapify.py: add a new function to_tap_class which takes a class, function, data model class, or data model instance, and returns a Tap class (not instance). It allows a user to do arbitrary argument handling for any object.
tests/test_tapify.py: add tests for Pydantic data models. These should already work in main. I just added tests for them.
tests/test_to_tap_class.py: new test suite for testing tap.to_tap_class.

martinjm97 commented 7 months ago

Very sorry! @swansonk14 and I are just clearing out time to give this a proper code review. This is our number 1 priority on Tap until it's in. Thank you so much for your tremendous effort and extremely high quality work.

--Jesse

kddubey commented 7 months ago

No worries, take your time :-)

martinjm97 commented 7 months ago

Hi @kddubey,

Thank you for all the tremendous work you have put into this PR! We learned a lot from your use of pytest fixtures/parameterize and appreciated the organization of tapify and to_tap_class.

We read over all of your code and it looks great! However, when started testing we found a small bug. When we tapifyd a pydantic class and provided an unused argument, tapify did not raise an error during parsing even though it should have.

Dataclasses have the desired functionality:

from dataclasses import dataclass
from tap import Tap, tapify
@dataclass
class Model:
    id: int = 1

args = tapify(Model)
print(args.id)

>>> python demo_data_model.py --id 7 --junk 89
usage: demo_data_model.py [--id ID] [-h]
demo_data_model.py: error: unrecognized arguments: --junk 89

pydantic classes do not raise an error:

from tap import Tap, tapify
from pydantic import BaseModel
class Model(BaseModel):
    id: int = 1

args = tapify(Model)
print(args.id)

>>> python demo_data_model.py --id 7 --junk 89
7

We still want to spend more time testing your implementation before merging it into main.

Thank you so much again for all of this fantastic work! We plan to merge this in as soon as we've finished testing and (you or we) have fixed issues that arise.

Best, JK

kddubey commented 7 months ago

@martinjm97 see this comment and this comment. Note that, by default, a Pydantic BaseModel (surprisingly, to me at least) ignores extra fields that are passed to it; it doesn't raise an error. Should I change the two variables (mentioned in the first comment) to be False so that supplying extra arguments always raises an error?

martinjm97 commented 7 months ago

Sorry for missing your comments! We'll be sure to read all the comments before reporting supposed bugs!

Thanks, JK

kddubey commented 7 months ago

No worries. That comment from a week ago wasn't really clear now that I'm reading it. And I appreciate that you're testing the code independently!

kddubey commented 6 months ago

Hi @martinjm97, the last commit changes the Pydantic implementation to not allow extra arguments for Pydantic BaseModels by default. That way it's backwards compatible and sensible. Sorry for causing confusion.

I also added support for passing extra command line args to the pydantic model if the model is configured that way. This change is backwards compatible, i.e., you can already do this in main.

For example, for this script—

# demo_extra_args.py
from pydantic import BaseModel, ConfigDict
from tap import tapify

class User(BaseModel):
    model_config = ConfigDict(extra="allow")  # explicitly allow extra attributes

    name: str

if __name__ == "__main__":
    user = tapify(User)
    print(user)

—running

python demo_extra_args.py --name Tapify --number 12

prints

name='Tapify' number='12'

(Extra args get cast to string, which matches what happens in main. The user can add custom validation if they want to change this behavior.)

Removing the model_config = ... line will cause that last command to raise an error:

usage: demo_extra_args.py --name NAME [-h]
demo_extra_args.py: error: unrecognized arguments: --number 12

Lmk what you think. Thanks!

martinjm97 commented 6 months ago

Hi @kddubey,

Thank you for all of the extensive testing and bearing with us! @swansonk14 and I went through all of the examples in https://docs.pydantic.dev/latest/concepts/fields/ and did our own testing. Everything that we could think of seems to be working.

Thank you again for all your hard work and excellent execution! We will merge it in. We plan to do another pass over code to make small changes to the code style and comments before including it in the next release.

Best, JK

kddubey commented 6 months ago

Sounds good, and feel free to lmk if you need any help or clarification on this code in the future :-)

martinjm97 commented 6 months ago

Thank you! It has been a pleasure to learn from your work and to work with you. Sorry for the slow turn around. Kyle and I are so happy this landed!

--Jesse

tinkerware commented 5 months ago

I'd love to see a new release that includes this change; looking forward to using Pydantic models with tapify.

kddubey commented 5 months ago

@tinkerware tapify should already work w/ Pydantic models. It's just that the field descriptions are missing from the -h message.

martinjm97 commented 5 months ago

We've been trying to find time to get it at least above the codecov threshold before making the new release. Sorry for the delay!

kddubey commented 5 months ago

@martinjm97 Happy to help w/ that since I'm not as busy. And cuz I broke it lol :-)

I believe the root cause is that the coverage % is based on a coverage report generated by tests ran in an evironment which doesn't have pydantic installed. The best solution is to take the union of covered lines across the 3 tests in the test environments in the testing workflow (no pydantic, pydantic v1, pydantic v2), but I'm not sure if that feature is supported. Will look into it soon

martinjm97 commented 5 months ago

@kddubey, that would be absolutely spectacular if possible. No worries if it doesn't work out though. Thank you again!

martinjm97 commented 5 months ago

@tinkerware, thanks to the work of @kddubey, the implementation was prepared for release. You can see it here: https://github.com/swansonk14/typed-argument-parser/releases/tag/v_1.10.0.

--Jesse

tinkerware commented 5 months ago

Great, now I can ditch the git-ref in my poetry dependencies list! I needed the changes in this PR so that I could use to_tap_class.

Thanks again to the OP and the maintainers for all the work and effort you put into this project!

tinkerware commented 5 months ago

So, I wanted to write up a quick summary of how I'm using Pydantic and this project together to parse CLI flags, so it can be a pointer for others looking to do the same and for me to learn from comments.

I use Pydantic to parse configuration and to represent the directives of the CLI flags with a "command" abstraction:

from pydantic import BaseModel
from pydantic_settings import BaseSettings, SettingsConfigDict

class AppConfig(BaseSettings):
  model_config = SettingsConfigDict(
        env_prefix="APP_",
        env_file='.env',
        env_file_encoding='utf-8',
        env_ignore_empty=True,
        str_strip_whitespace=True,
        revalidate_instances='always',
        frozen=True
    )

  api_key: str | None = None
  debug: bool = False
  ...

ShowOption: TypeAlias = Literal[...]
ShowInput: TypeAlias = Literal['all', ShowOption]

class AppCommand(BaseModel):
  """Makes hard things easy in a pretty way."""

  show: list[ShowInput] = Field(
    default=["sensible-default"],
    description="..."
  )
  ...

  @field_validator("show")
  @classmethod
  def validate_show(cls, show: list[ShowInput]) -> list[ShowOption]:
    if "all" in show:
      return [...]  # can expand convenience flags into underlying options
    return show

def parse_flags(*flags: str) -> tuple[AppConfig, AppCommand]:
  class AppFlags(to_tap_class(AppConfig), to_tap_class(AppCommand)):
    def configure() -> None:
      self.description = AppCommand.__doc__

   args = AppFlags().parse_args(args=flags or None).as_dict()
   return env_config(**args), AppCommand(**args)

This lets me override any configuration using CLI flags, and allows to use Pydantic validators to convert CLI flags to the data that the rest of the app expects.

Configuration overrides from CLI flags look something like this (e.g. the env_config function above):

try:
    _ENV_CONFIG = AppConfig()
except ValidationError as e:
    fail("Ensure all `APP_*` environment variables are set correctly: {}", e)

def env_config(**overrides: Any) -> AppConfig:
    if not overrides:
        return _ENV_CONFIG

    # The actual changes are overrides that are different from default values.
    changes = (
        _ENV_CONFIG
        .model_copy(update=overrides)
        .model_dump(exclude_defaults=True)
    )
    return AppConfig.model_validate(
      _ENV_CONFIG.model_copy(update=changes)
    )

This lets me pass a --debug flag in CLI, or define a APP_DEBUG=1 environment variable to control debug behavior, though it does not allow overriding an enabled debug config with a disabled debug CLI flag. That would require me to use the explicit bool option in Tap.parse_args in addition to tracking the set/not-set state of an attribute like a lot of other packages end up doing.

I hope this makes sense and gives you an idea of how others can make use of better integration with Pydantic.

kddubey commented 5 months ago

This lets me override any configuration using CLI flags, and allows to use Pydantic validators to convert CLI flags to the data that the rest of the app expects.

Wow, very cool to see that level of sophistication @tinkerware. Thank you for sharing this example and explaining it!

kddubey commented 5 months ago

@swansonk14 minor note: should the to_tap_class feature also be mentioned in the release notes?

martinjm97 commented 5 months ago

I've updated the release notes with the example from @tinkerware as well as a discussion of the to_tap_class that @kddubey was gracious enough to remind us of. Thank you both for your contributions!

--Jesse

swansonk14 / typed-argument-parser

`to_tap_class`, and inspect fields instead of signature for data models #128

Future refactor