Closed kddubey closed 6 months ago
Hi @kddubey,
The PR looks great so far! Thank you for all the work you put in!
@swansonk14 and I will meet to discuss this PR within a week.
My opinion is that both of the advantages that you listed of tap_class_from_data_model
over tapify
should be incorporated into tapify
. As a result, I'm inclined to put the new support for Pydantic in tapify
as well.
We'll keep you posted. Thank you again!
--Jesse
Hi @kddubey,
We are in support of two different functions convert_to_tap_class
and tapify
. Then convert_to_tap_class
(a renaming of tap_class_from_data_model
) produces a Tap class from a class
, dataclass
, function, or pydantic model and tapify
would call convert_to_tap
under the hood and then either instantiate the class or run the function.
Here's a version that we were working on https://github.com/swansonk14/typed-argument-parser/blob/convert_to_tap/tap/tapify.py, but we haven't had time to complete it. Any more contributions in this direction would be greatly appreciated! Thank you for what you have done already!
We'll revisit this PR in the next few weeks and find a way to get this into main!
Best, JK (Jesse and Kyle)
@martinjm97 thank you for the feedback about convert_to_tap_class
, and the typing fixes from a few days ago. I updated the PR. Current tests should be passing.
Remaining todos for me before I un-draft this PR and ask for a full review:
pydantic
pydantic
dataclasses and BaseModel
s in tests/test_tapify.py
to_tap_class
(rn it's just demo'd)pydantic
v1Thank you so much again! Wow!
The PR is ready for review @martinjm97
(Vocab note: "data model" refers to builtin dataclasses, Pydantic models, and Pydantic dataclasses. If something is a data model, it means it has "fields" which can be publicly inspected to extract argument data like the type annotation, default value, and description.)
Summary of changes:
tap/tapify.py
: for data models, look at the field info instead of the signature. The only benefit is that field descriptions can be used as argument descriptions, which improves -h
. It might also be more canonical / robust to inspect fields instead of the signature for data models. If you'd like to keep the current behavior—inspect the signature, not fields—change _tap_data
to just return _tap_data_from_class_or_function(...)
.
tap/tapify.py
: add a new function to_tap_class
which takes a class, function, data model class, or data model instance, and returns a Tap
class (not instance). It allows a user to do arbitrary argument handling for any object.
tests/test_tapify.py
: add tests for Pydantic data models. These should already work in main. I just added tests for them.
tests/test_to_tap_class.py
: new test suite for testing tap.to_tap_class
.
Very sorry! @swansonk14 and I are just clearing out time to give this a proper code review. This is our number 1 priority on Tap until it's in. Thank you so much for your tremendous effort and extremely high quality work.
--Jesse
No worries, take your time :-)
Hi @kddubey,
Thank you for all the tremendous work you have put into this PR! We learned a lot from your use of pytest fixtures/parameterize and appreciated the organization of tapify
and to_tap_class
.
We read over all of your code and it looks great! However, when started testing we found a small bug. When we tapify
d a pydantic class and provided an unused argument, tapify
did not raise an error during parsing even though it should have.
Dataclasses have the desired functionality:
from dataclasses import dataclass
from tap import Tap, tapify
@dataclass
class Model:
id: int = 1
args = tapify(Model)
print(args.id)
>>> python demo_data_model.py --id 7 --junk 89
usage: demo_data_model.py [--id ID] [-h]
demo_data_model.py: error: unrecognized arguments: --junk 89
pydantic
classes do not raise an error:
from tap import Tap, tapify
from pydantic import BaseModel
class Model(BaseModel):
id: int = 1
args = tapify(Model)
print(args.id)
>>> python demo_data_model.py --id 7 --junk 89
7
We still want to spend more time testing your implementation before merging it into main.
Thank you so much again for all of this fantastic work! We plan to merge this in as soon as we've finished testing and (you or we) have fixed issues that arise.
Best, JK
@martinjm97 see this comment and this comment. Note that, by default, a Pydantic BaseModel
(surprisingly, to me at least) ignores extra fields that are passed to it; it doesn't raise an error. Should I change the two variables (mentioned in the first comment) to be False
so that supplying extra arguments always raises an error?
Sorry for missing your comments! We'll be sure to read all the comments before reporting supposed bugs!
Thanks, JK
No worries. That comment from a week ago wasn't really clear now that I'm reading it. And I appreciate that you're testing the code independently!
Hi @martinjm97, the last commit changes the Pydantic implementation to not allow extra arguments for Pydantic BaseModel
s by default. That way it's backwards compatible and sensible. Sorry for causing confusion.
I also added support for passing extra command line args to the pydantic model if the model is configured that way. This change is backwards compatible, i.e., you can already do this in main.
For example, for this script—
# demo_extra_args.py
from pydantic import BaseModel, ConfigDict
from tap import tapify
class User(BaseModel):
model_config = ConfigDict(extra="allow") # explicitly allow extra attributes
name: str
if __name__ == "__main__":
user = tapify(User)
print(user)
—running
python demo_extra_args.py --name Tapify --number 12
prints
name='Tapify' number='12'
(Extra args get cast to string, which matches what happens in main. The user can add custom validation if they want to change this behavior.)
Removing the model_config = ...
line will cause that last command to raise an error:
usage: demo_extra_args.py --name NAME [-h]
demo_extra_args.py: error: unrecognized arguments: --number 12
Lmk what you think. Thanks!
Hi @kddubey,
Thank you for all of the extensive testing and bearing with us! @swansonk14 and I went through all of the examples in https://docs.pydantic.dev/latest/concepts/fields/ and did our own testing. Everything that we could think of seems to be working.
Thank you again for all your hard work and excellent execution! We will merge it in. We plan to do another pass over code to make small changes to the code style and comments before including it in the next release.
Best, JK
Sounds good, and feel free to lmk if you need any help or clarification on this code in the future :-)
Thank you! It has been a pleasure to learn from your work and to work with you. Sorry for the slow turn around. Kyle and I are so happy this landed!
--Jesse
I'd love to see a new release that includes this change; looking forward to using Pydantic models with tapify
.
@tinkerware tapify
should already work w/ Pydantic models. It's just that the field descriptions are missing from the -h
message.
We've been trying to find time to get it at least above the codecov threshold before making the new release. Sorry for the delay!
@martinjm97 Happy to help w/ that since I'm not as busy. And cuz I broke it lol :-)
I believe the root cause is that the coverage % is based on a coverage report generated by tests ran in an evironment which doesn't have pydantic
installed. The best solution is to take the union of covered lines across the 3 tests in the test environments in the testing workflow (no pydantic, pydantic v1, pydantic v2), but I'm not sure if that feature is supported. Will look into it soon
@kddubey, that would be absolutely spectacular if possible. No worries if it doesn't work out though. Thank you again!
@tinkerware, thanks to the work of @kddubey, the implementation was prepared for release. You can see it here: https://github.com/swansonk14/typed-argument-parser/releases/tag/v_1.10.0.
--Jesse
Great, now I can ditch the git-ref in my poetry dependencies list! I needed the changes in this PR so that I could use to_tap_class
.
Thanks again to the OP and the maintainers for all the work and effort you put into this project!
So, I wanted to write up a quick summary of how I'm using Pydantic and this project together to parse CLI flags, so it can be a pointer for others looking to do the same and for me to learn from comments.
I use Pydantic to parse configuration and to represent the directives of the CLI flags with a "command" abstraction:
from pydantic import BaseModel
from pydantic_settings import BaseSettings, SettingsConfigDict
class AppConfig(BaseSettings):
model_config = SettingsConfigDict(
env_prefix="APP_",
env_file='.env',
env_file_encoding='utf-8',
env_ignore_empty=True,
str_strip_whitespace=True,
revalidate_instances='always',
frozen=True
)
api_key: str | None = None
debug: bool = False
...
ShowOption: TypeAlias = Literal[...]
ShowInput: TypeAlias = Literal['all', ShowOption]
class AppCommand(BaseModel):
"""Makes hard things easy in a pretty way."""
show: list[ShowInput] = Field(
default=["sensible-default"],
description="..."
)
...
@field_validator("show")
@classmethod
def validate_show(cls, show: list[ShowInput]) -> list[ShowOption]:
if "all" in show:
return [...] # can expand convenience flags into underlying options
return show
def parse_flags(*flags: str) -> tuple[AppConfig, AppCommand]:
class AppFlags(to_tap_class(AppConfig), to_tap_class(AppCommand)):
def configure() -> None:
self.description = AppCommand.__doc__
args = AppFlags().parse_args(args=flags or None).as_dict()
return env_config(**args), AppCommand(**args)
This lets me override any configuration using CLI flags, and allows to use Pydantic validators to convert CLI flags to the data that the rest of the app expects.
Configuration overrides from CLI flags look something like this (e.g. the env_config
function above):
try:
_ENV_CONFIG = AppConfig()
except ValidationError as e:
fail("Ensure all `APP_*` environment variables are set correctly: {}", e)
def env_config(**overrides: Any) -> AppConfig:
if not overrides:
return _ENV_CONFIG
# The actual changes are overrides that are different from default values.
changes = (
_ENV_CONFIG
.model_copy(update=overrides)
.model_dump(exclude_defaults=True)
)
return AppConfig.model_validate(
_ENV_CONFIG.model_copy(update=changes)
)
This lets me pass a --debug
flag in CLI, or define a APP_DEBUG=1
environment variable to control debug behavior, though it does not allow overriding an enabled debug config with a disabled debug CLI flag. That would require me to use the explicit bool option in Tap.parse_args
in addition to tracking the set/not-set state of an attribute like a lot of other packages end up doing.
I hope this makes sense and gives you an idea of how others can make use of better integration with Pydantic.
This lets me override any configuration using CLI flags, and allows to use Pydantic validators to convert CLI flags to the data that the rest of the app expects.
Wow, very cool to see that level of sophistication @tinkerware. Thank you for sharing this example and explaining it!
@swansonk14 minor note: should the to_tap_class
feature also be mentioned in the release notes?
I've updated the release notes with the example from @tinkerware as well as a discussion of the to_tap_class
that @kddubey was gracious enough to remind us of. Thank you both for your contributions!
--Jesse
(Updated summary of changes)
This is a WIP following up on #125. It should also close #112.
2 advantages to using
tap_class_from_data_model
instead oftapify
:-h
help stringTap
class, which allows the user to overrideconfigure
andprocess_args
. See the example here.Future refactor
I'm thinking we can allow
tapify
to return aTap
class instead of calling the input object.@martinjm97 What do you think of this refactor? Lmk if I should open a new issue for this, or a PR if that makes things more concrete.
For now, my plan is to focus on the todos above. The downside to merging in
tap_class_from_data_model
is that there would be 2 interfaces—tapify
andtap_class_from_data_model
—which "tapify" a Pydantic model or dataclass, when there could be 1:tapify
.