pydantic / pydantic-settings

Settings management using pydantic
https://docs.pydantic.dev/latest/usage/pydantic_settings/
MIT License
555 stars 55 forks source link

No arguments to CLI subparser fails to raise an error #335

Closed mpkocher closed 1 week ago

mpkocher commented 1 month ago

With pydantic-settings 2.3.4, when no arguments are provided to a subparser style CLI, there is no exception raised.

I believe a user should not have to explicitly catch for these None cases on the "Root" Settings model.

Acceptance criteria:

To demonstrate this issue:

import sys
from pydantic import BaseModel
from pydantic_settings import (BaseSettings, CliSubCommand)

class Alpha(BaseModel):
    """Alpha Utils"""
    name: str

class Beta(BaseModel):
    """Beta Utils"""
    age: int

class Root(BaseSettings, cli_parse_args=True, cli_prog_name="example"):
    alpha: CliSubCommand[Alpha]
    beta: CliSubCommand[Beta]

def runner_01(r: Root) -> int:
    print(f"Running {r}")
    return 0

if __name__ == '__main__':
    sys.exit(runner_01(Root()))

Running would be:

> python example.py                    
Running alpha=None beta=None
> echo $?
0

It's possible to workaround the current design by mapping these Optional fields of Root to a tuple, then case matching (alternatively, writing with if/elif).

def runner_02(r: Root) -> int:
    match (r.alpha, r.beta):
        case (None, None): raise ValueError(f"Invalid {r}")
        case (None, beta): print(f"Running beta {beta=}")
        case (alpha, None): print(f"Running {alpha=}")
        case (_, _): raise ValueError(f"Invalid {r}")
    return 0

However, it's a bit clunky and not particularly obvious that you need to handle these None cases. I believe the structural validation/checking should be occurring at the pydantic-settings level.

Misc Feedback

From a type standpoint, the current design is a bit peculiar because it leans on None.

class Root(BaseSettings, cli_parse_args=True, cli_prog_name="example"):
    alpha: CliSubCommand[Alpha] # sugar for Optional[_CliSubCommand]
    beta: CliSubCommand[Beta]

I believe the it would be useful to use a union type in the design.

class Root(BaseSettings, cli_parse_args=True, cli_prog_name="example"):
    command: Alpha | Beta

Calling a function/main would be:

def runner(m: Model):

    match m.command:
        case Alpha(): print(f"Running alpha={m.command}")
        case Beta(): print(f"Running beta={m.command}")

In general, it seems like there's a core part of the API that is missing the plumbing layer of calling your function with the initialized and validated settings.

There's also bit of friction with using mypy.

from pydantic_settings import BaseSettings

class Root(BaseSettings, cli_parse_args=True):
    name: str

if __name__ == "__main__":
    print(Root())

Yields

> mypy example.py         
example.py:9: error: Missing named argument "name" for "Root"  [call-arg]
Found 1 error in 1 file (checked 1 source file)

Perhaps this is general issue with pydantic-settings and should be filed as a separate ticket.

hramezani commented 1 month ago

Thanks @mpkocher for reporting this issue.

@kschwab Could you please take a look at this issue?

@mpkocher regarding mypy issue, did you enable Pydantic mypy plugin?

kschwab commented 1 month ago

Hey @mpkocher, thanks for the feedback and ticket. You raised a lot of good points.

No arguments to CLI subparser fails to raise an error

Currently this is expected since CLI subcommands are treated as optional. However, it should be made configurable to support the use case you are raising. This would actually better align with argparse which makes subcommands optional by default, but has the ability to make them required (added in 3.7).

Useful to use a union type in the design

Yes, I agree with this somewhat as we had a similar approach before landing on the current solution. However, when digging deeper, unless we're wanting to support multiple subcommands, the current solution provides more benefits. There is a longer outline of my thoughts here, but the summary is:

If you have strong opinions otherwise, I would be interested in hearing your thoughts. Lastly, I would say the main desire for a union approach primarily comes down to being able to do what was shown in the examples:

class Root(BaseSettings, cli_parse_args=True, cli_prog_name="example"):
    command: Alpha | Beta

def runner(m: Model):

    match m.command:
        case Alpha(): print(f"Running alpha={m.command}")
        case Beta(): print(f"Running beta={m.command}")

i.e., having the ability to directly reference a "subcommand" from an object. I completely agree with this, and it feeds into your last point I'll cover below.

As a side note, the Optional[_CliSubCommand] is unrelated to CLI subcommands being treated as "optional". That's more due to the internal parsing logic and the need for a way to control the setting (as discussed in first point).

Clunky and not particularly obvious that you need to handle these None cases.

Yep, I agree 💯 with this, especially when viewing it from a CLI point of view (i.e., use case #2), which I think is the dominate use case for subcommands.

My thoughts are to add a simple function that returns the subcommand for a model. Given your feedback, we could additionally add an is_required flag to raise an exception if the subcommand is not set, e.g.:

import sys
from pydantic import BaseModel
from pydantic_settings import (BaseSettings, CliSubCommand)
from pydantic_settings import get_subcommand # proposed CLI subcommand retrieval function (hand-wavy name)

class Alpha(BaseModel):
    """Alpha Utils"""
    name: str

    def cli_main(self) -> int:
        print(f"Running alpha {self}")
        return 0

class Beta(BaseModel):
    """Beta Utils"""
    age: int

    def cli_main(self) -> int:
        print(f"Running beta {self}")
        return 0

class Root(BaseSettings, cli_parse_args=True, cli_prog_name="example"):
    alpha: CliSubCommand[Alpha]
    beta: CliSubCommand[Beta]

    def cli_main(self) -> int:
        print(f"Running root {self}")
        return get_subcommand(self, is_required=True).cli_main()

if __name__ == '__main__':
    sys.exit(Root().cli_main())
kschwab commented 1 month ago

Updated subcommand docs can be found here in PR.

kschwab commented 1 month ago

@mpkocher, I reviewed your feedback on including an "app" layer, to which I agree. Overall, my impressions are the CliApp would allow us to better tailor the experience out of the box for use case 2. We could move to formalize something like the below, which accomplishes what you're looking for (at least for the CLI):

import sys
from pydantic import BaseModel, Field
from typing import Optional, NoReturn, Any, override
from pydantic_settings import BaseSettings, CliSubCommand, get_subcommand
from abc import ABC, abstractmethod

class CliApp(BaseSettings, ABC):
    def __init__(self) -> None:
        ...

    def _finalize(self, argv: list[str] | bool) -> Any:
        super().__init__(
            _cli_parse_args=argv,
            _env_parse_none_str='null',
            # etc.
        )
        return self

    @classmethod
    def run(cls, argv: Optional[list[str]] = None, is_exit: bool = True) -> int | NoReturn:
        status = cls.main(cls()._finalize(argv if argv else True))
        if not is_exit:
            return status
        sys.exit(status)

    @abstractmethod
    def main(self) -> int:
        ...

class Alpha(BaseModel):
    name: str
    age: int

class AlphaCmd(Alpha):
    def run(self) -> int:
        print(self.name, self.age)
        return 0

class BetaCmd(BaseModel):
    color: Optional[str]
    def run(self) -> int:
        print(self.color)
        return 0

class App(CliApp):
    alpha: CliSubCommand[AlphaCmd]
    beta: CliSubCommand[BetaCmd]
    is_get_version: bool = Field(False, alias='version')

    def _get_version(self) -> int:
        print('1.1.1')
        return 0

    @override
    def main(self) -> int:
        if self.is_get_version:
            return self._get_version()
        return get_subcommand(self).run()

if __name__ == '__main__':
    sys.argv = ['app', 'beta', '--color', 'blue']
    App().run()
    #> blue

Customization of sources would still be done by overriding settings_customise_sources, which is related to your other feedback and is outside of scope for this issue and CLI settings.

Additionally, the above will handle nested subcommands and top level flags or args. Nested subcommands are accomplished by using the get_subcommmand recursively in the main or subcommand routines.

Lastly, if desired, subcommand: CliSubCommand[AlphaCmd | BetaCmd] notation can be used with CLI names derived from class name. However, I would still discourage this approach in favor of keeping them unique for reasons already outlined previously.

If the above looks good, we can plan on bringing it in after some cleanup and doc updates with this fix.

mpkocher commented 1 month ago
  1. I am not sure NoReturn should be used in an Union type.
  2. To avoid this, I would suggest make it explicit run(argv: list[str] | None = None) -> int and run_and_exit(argv: list[str] | None) -> NoReturn
  3. The subcommands need to be configurable as a string, not as a python identifier (which has fundamental restrictions). For example "run-tests" is a valid subcommand, but it is not a validate python identifier. I used cli_name in the Cmd interface to explicit configure the command name. If the CLI Sources config doesn't have cli_name then, it will use the Cmd value (or default to __file__). This works for the Subcommand and non-Subcommand parsers.
  4. To avoid another import, make get_subcommand a method on CliApp. (maybe this is done for implementation reasons?)

Please feel free to open another ticket to update/improve the docs. I can take a pass at that if you want.

kschwab commented 1 month ago

@mpkocher, I've finalized most the implementation addressing your concerns above.

I am not sure NoReturn should be used in an Union type. To avoid this, I would suggest make it explicit run(argv: list[str] | None = None) -> int and run_and_exit(argv: list[str] | None) -> NoReturn

This has now diverged. Instead of run returning int, run returns the result of the cli_cmd method defined by users (i.e., Any).

The subcommands need to be configurable as a string, ... For example "run-tests" is a valid subcommand, ...

This is an alias use case. I've updated subcommand and positional args to handle alias assignment.

To avoid another import, make get_subcommand a method on CliApp. (maybe this is done for implementation reasons?)

I completely agree with this. It was because nested subcommands weren't of type CliApp (or BaseSettings), so it necessitated some way of grabbing them from BaseModel types generically.

I've updated the solution to provide everything under CliCmd, which simply provides command structure and methods to the models. Still considering whether to break run and finalize out under a seperate CliCmdRoot class. Currently it raises an exception if they are called from a non-root instance.

class CliCmd(ABC):
    def finalize(self, args: list[str] | None = None,
                 cli_settings_source: CliSettingsSource[Any] | None = None) -> Any:
        """Finalize the CliCmd object by parsing the CLI args."""
        ...

    def run(self, args: list[str] | None = None,
            cli_settings_source: CliSettingsSource[Any] | None = None) -> Any:
        """Parse the CliCmd args and run the cli_cmd method."""
        ...

    def get_subcommand(self) -> Any:
        """Get the subcommand model."""
        ...

    def run_subcommand(self, *args: Any, **kwargs: Any) -> Any:
        """Run the subcommand cli_cmd method."""
        ...

    @abstractmethod
    def cli_cmd(self, *args: Any, **kwargs: Any) -> Any:
        """The cli command method entry point."""
        ...

Revisiting our example from before, I've updated it with CliCmd and a run-tests subcommand:

import sys
from pydantic import BaseModel, Field
from typing import Optional, override
from pydantic_settings import BaseSettings, CliCmd, CliSubCommand

class Alpha(BaseModel):
    name: str
    age: int

class AlphaCmd(Alpha, CliCmd):
    @override
    def cli_cmd(self) -> int:
        print(self.name, self.age)
        return 0

class BetaCmd(BaseModel, CliCmd):
    color: Optional[str]
    @override
    def cli_cmd(self) -> int:
        print(self.color)
        return 0

class App(BaseSettings, CliCmd):
    alpha: CliSubCommand[AlphaCmd]
    beta: CliSubCommand[BetaCmd] = Field(alias='run-tests')

    is_get_version: bool = Field(False, alias='version')
    def get_version(self) -> int:
        print('1.1.1')
        return 0

    @override
    def cli_cmd(self) -> int:
        if self.is_get_version:
            return self.get_version()
        return self.run_subcommand()

if __name__ == '__main__':
    sys.argv = ['app', 'run-tests', '--color', 'blue']
    sys.exit(App().run())
    #> blue
mpkocher commented 1 month ago

For your team or org projects, you can use whatever level of gradual typing that would make sense for those requirements. However, for core libraries, I believe you want to stay more on the typed side the spectrum and avoid Any.

  1. If possible try to avoid using Any (if possible). The current use of Any mutes the usefulness of mypy.
  2. Cmd is already defined as an abstraction, It's unclear why CliSubCommand is necessary.
  3. I would still push for App to have cmd: Cmd | dict[str, Cmd] where the str would be the subcommand name.

Unfortunately, I think I'm running out of gas on this one. I believe you should do whatever it aligned with your requirements and vision for this feature. Perhaps other community members could provide feedback.

kschwab commented 1 month ago

No worries @mpkocher, the design has stabilized since then.

I ended up heading in a direction similar to click or typer using a decorator-based approach. IMO, it's an even better fit when paired with the class structure and typing that Pydantic provides; it's very intuitive. It also allows for us to extend the CLI usage to BaseModel derived types, not just BaseSettings. All of this is now bundled under CliApp, which provides:

Using the @CliApp.main decorator on a BaseModel or BaseSettings derived class will run the cli_cmd method by default on instantiation:

from pydantic import BaseModel
from pydantic_settings import CliApp

@CliApp.main
class App(BaseModel):

    def cli_cmd(self) -> None:
        print("Hello World")

App()
"""
Hello World
"""

Alternatively, we can use the @CliApp.command decorator to specifically mark the command method we want to run, e.g., my_cli_cmd:

from pydantic import BaseModel
from pydantic_settings import CliApp

@CliApp.main
class App(BaseModel):

    @CliApp.command
    def my_cli_cmd(self) -> None:
        print("Hello World")

App()
"""
Hello World
"""

Adding subcommands, we can run them using CliApp.run_subcommand:

import sys
from typing import Optional
from pydantic import BaseModel
from pydantic_settings import CliApp, CliSubCommand

class AlphaCmd(BaseModel):
    name: str
    age: int

    def cli_cmd(self) -> None:
        print("AlphaCmd", self.name, self.age)

class BetaCmd(BaseModel):
    color: Optional[str]

    def cli_cmd(self) -> None:
        print("BetaCmd", self.color)

@CliApp.main
class App(BaseModel):
    alpha: CliSubCommand[AlphaCmd]
    beta: CliSubCommand[BetaCmd]

    def cli_cmd(self) -> None:
        CliApp.run_subcommand(self)

sys.argv = ["example.py", "beta", "--color=green"]
App()
"""
BetaCmd green
"""

Or, we can use CliApp.run if we want to directly provide cli_args:

CliApp.run(App, cli_args=["beta", "--color=green"])
"""
BetaCmd green
"""

CliApp.run is generic, meaning so long as a BaseModel or BaseSettings derived class has a cli_cmd or CliApp.command, it can run without the need of a CliApp.main decorator:

CliApp.run(BetaCmd, cli_args=["--color=green"])
"""
BetaCmd green
"""

Furthermore, both CliApp.run and CliApp.run_subcommand return None type. Therefore, if a user wants a specific return type and exit code strategy etc., they can use CliApp.get_subcommand:

import sys
from typing import NoReturn, Optional
from pydantic import BaseModel
from pydantic_settings import CliApp, CliSubCommand

class AlphaCmd(BaseModel):
    name: str
    age: int

    def cli_cmd(self) -> int:
        print("AlphaCmd", self.name, self.age)
        return 0

class BetaCmd(BaseModel):
    color: Optional[str]

    def cli_cmd(self) -> int:
        print("BetaCmd", self.color)
        return 0

@CliApp.main
class App(BaseModel):
    alpha: CliSubCommand[AlphaCmd]
    beta: CliSubCommand[BetaCmd]

    def cli_cmd(self) -> NoReturn:
        sys.exit(CliApp.get_subcommand(self).cli_cmd())

sys.argv = ["example.py", "beta", "--color=green"]
App()
"""
BetaCmd green
"""

Of course, given CliApp.get_subcommand returns Any and is not strictly typed, a user can use a CliSubCommand union if desired (which comes with the usual tradeoffs):

@CliApp.main
class App(BaseModel):
    subcommand: CliSubCommand[AlphaCmd | BetaCmd]

    def cli_cmd(self) -> NoReturn:
        sys.exit(self.subcommand.cli_cmd())

However, I would not actively encourage the cli_cmd -> int pattern we've been kicking around above. The recommendation is to return None in successful cases, and otherwise raise exception or exit with non-zero status for failing cases. It's a bit superfluous, and not how most CLI app libs (e.g., click, typer, etc.) operate.

Lastly, with respect to the last point raised:

I would still push for App to have cmd: Cmd | dict[str, Cmd] where the str would be the subcommand name.

I would disagree for several reasons:

  1. cmd is now type Cmd OR dict[str, Cmd]. At the very least, it should be Annotated[Cmd, 'my_cmd_name'], otherwise this breaks use case 1. Admitedly, somthing like Annotated within the CliSubCommand union case would be useful. I'll give it some further thought.
  2. An alias is only necessary if the command is not a valid python identifier (a niche case). In majority of cases, it's much easier to simply name the variable the name of the command, e.g. my_cmd_name: Cmd
  3. Aliases are already well defined by Pydantic with more features (e.g., AliasPath). Meaning, it not only breaks consistency with Pydantic, but also with how CLI short and long options are aliased.

Given the above, I don't see the upside for using a Cmd | dict[str, Cmd] approach.

kschwab commented 1 month ago

@hramezani would you like me to proceed forward with adding a CLI app layer (as discussed above), or only resolve this issue with get subcommand?

hramezani commented 1 month ago

@kschwab I couldn't follow the whole discussion. Adding the CLI app layer is a structural change and I would prefer to prevent it for now. let's fix the issue for now.

kschwab commented 1 month ago

PR is ready. I stripped the branch back to the original fix. We can raise the other issues separately.