Closed simeoncarstens closed 1 month ago
We (@zz1874, I) wrote a hacky demo script that uses a CLI based on pydantic-argparse
to run the hello_looper
example. You can find it on a branch in our fork of the repository. It serves as a proof-of-concept of how to define CLI arguments in a pydantic
model.
Going forward, I (this has not been discussed with neither @nsheff nor @zz1874) propose to organize the pydantic
-based CLI rewrite as follows:
At the core, we introduce a dataclass
called Argument
, which holds all kind of information we need to know about a CLI argument / flag. That includes for now:
str
),""
),("run", "runp", "rerun")
, or possibly treat commands as an enum).We then create a data structure (a tuple, or an enum) that holds an Argument
instance for each argument there is. We can then draw from this pool of argument definitions to dynamically create pydantic
models for each command via pydantic.create_model()
. This approach allows us to define arguments only once and reuse these definitions easily for different commands.
I am working on a first implementation of this approach in the following branch: https://github.com/tweag/looper/tree/tweag/pydantic-command-models
Given that we're running into issues with packages that don't support pydantic 2.0 (see here for example: https://github.com/pepkit/pipestat/issues/127), and that pydantic-argparse appears to be at a dead-end, unresponsive as to whether pydantic 2.0 will ever be supported (https://github.com/SupImDos/pydantic-argparse/issues/48), we might want to rethink using pydantic-argparse, and instead, it might make more sense to just roll our own argparser from the Argument
objects, for example, with something like this:
ttps://stackoverflow.com/questions/72741663/argument-parser-from-a-pydantic-model
See also: https://github.com/edornd/argdantic
Given that we're running into issues with packages that don't support pydantic 2.0 (see here for example: pepkit/pipestat#127), and that pydantic-argparse appears to be at a dead-end, unresponsive as to whether pydantic 2.0 will ever be supported (SupImDos/pydantic-argparse#48), we might want to rethink using pydantic-argparse, and instead, it might make more sense to just roll our own argparser from the
Argument
objects, for example, with something like this:
I just ran into this issue with pydantic > 2.0.0 support.
Recently, we forced requirement for pephubclient >=0.4.0 ( see https://github.com/pepkit/looper/issues/453, https://github.com/pepkit/looper/commit/1c8a8ad138b1d5096a40118e1a9bfdd6ed8ca203).
As I continue working on migrating the arguments to the pydantic models, I am now running into dependency issues because pephubclient requires pydantic greater than 2.5.0 but pydantic-argparse depends on pydantic<2.0.0:
ERROR: Cannot install looper and pydantic-argparse==0.8.0 because these package versions have conflicting dependencies.
The conflict is caused by:
pephubclient 0.4.0 depends on pydantic>2.5.0
pydantic-argparse 0.8.0 depends on pydantic<2.0.0 and >=1.9.0
Unfortunately, I've gotten quite far in finishing the migration before realizing this issue (I realized it when I merged some downstream changes upstream).
There is a fork of pydantic-argparse that uses pydantic2: https://github.com/anastasds/pydantic2-argparse
I was able to implement it here and resolve dependency issues: https://github.com/pepkit/looper/commit/066e661bf2b55ad0a7d120147e7f81c23b629913
Interestingly, the fork (and now my looper implementation) requires calling the still available pydantic.v1 api:
import pydantic.v1 as pydantic
Another issue as I rework the tests:
For arguments that can be given a list (e.g. selecting or excluding based on multiple flags), the argparser cannot seem to handle a list:
--looper-config .looper.yaml --exc-flag "failed" "running" run --dry-run
I've tried different permutations of the syntax as well as changing the Field type but am still striking out.
@donaldcampbelljr interesting issue! I investigated a bit and the problem seems to be that pydantic-argparse
doesn't know how to distinguish between list elements (such as "failed" and "running") and the subcommand ("run").
In fact, a related issue can be demonstrated with just the usual argparse
Python module:
from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument("--somelist", nargs="+")
parser.add_argument("-v", "-voo")
subparsers = parser.add_subparsers(title="subcommands")
cmd_parser = subparsers.add_parser("cmd")
cmd_parser.add_argument("--somestr", type=str)
args = parser.parse_args()
yields
$ python argtest.py --somelist a b cmd --somestr "Asdf"
usage: argtest.py [-h] [--somelist SOMELIST [SOMELIST ...]] [-v V] {cmd} ...
argtest.py: error: argument {cmd}: invalid choice: 'Asdf' (choose from 'cmd')
One way to "fix" ( :roll_eyes: ) this is to use a non-list-valued argument before the subcommand, for example:
$ python argtest.py --somelist a b -v bla cmd --somestr "Asdf"
which parses correctly.
But unfortunately it seems like a bug fix in pydantic[2]-argparse
would be required for even that workaround to work :slightly_frowning_face: I'd be happy to look into how to fix this!
@simeoncarstens Interesting! Thank you for taking a look. I was also digging into it this morning. I noticed that the compute argument (also a list) does appear to be working,but it is added to the Run parser and, thus, comes after the run command:
looper --looper-config .looper.yaml run --dry-run --compute PARTITION=standard cores='32' mem='32000'
this also works (moving the --dry-run after the list):
looper --looper-config .looper.yaml run --compute PARTITION=standard cores='32' mem='32000' --dry-run
I was thinking that one workaround would be to add all the shared arguments to the individual commands so that the list arguments can be used after the command parser. Not ideal but it might be necessary for this setup.
So, I tried out my proposed suggestion here: https://github.com/pepkit/looper/commit/9fdf85a518db8c429ef3a88206a122ac9cbeac01
It does seem to work. However, the position of the arguments (do they come before or after the command?) is still a bit opaque, e.g.:
looper --looper-config .looper.yaml --output-dir "your/output/dir" run --dry-run --compute mem='32000' --exc-flag "failed" "running"
I think I will proceed to place all of the arguments such that they come after the command. This will make the syntax similar to the previous CLI such that it will begin with looper run
.
Is there any reason we should not structure it in this way? We will will have copies of the shared arguments among the parsers but I believe that is the only downside.
Another option is to monkey-patch thepydantic2-argparse
. For example, we could require all list arguments to be given as comma-separated strings, and then add the following in cli_pydantic.py
right after the imports:
def parse_field(
parser: argparse.ArgumentParser,
field: pydantic.fields.ModelField,
) -> Optional[utils.pydantic.PydanticValidator]:
"""Adds container pydantic field to argument parser.
Args:
parser (argparse.ArgumentParser): Argument parser to add to.
field (pydantic.fields.ModelField): Field to be added to parser.
Returns:
Optional[utils.pydantic.PydanticValidator]: Possible validator method.
"""
import pydantic2_argparse.utils as utils
class SplitArgs(argparse.Action):
def __call__(self, parser, namespace, values, option_string=None):
setattr(namespace, self.dest, values.split(','))
# Add Container Field
parser.add_argument(
utils.arguments.name(field),
action=SplitArgs,
help=utils.arguments.description(field),
dest=field.alias,
metavar=field.alias.upper(),
required=bool(field.required),
)
# Construct and Return Validator
return utils.pydantic.as_validator(field, lambda v: v)
pydantic2_argparse.argparse.parser.parsers.container.parse_field = parse_field
This is ugly, but would have the advantage that the actual command / argument hierarchy is maintained. If we do this, then of course the mandatory comma separation (e.g. --exc-flag=running,failed
) would need to be very aggressively documented in the help string.
Putting the majority of shared arguments after the subcommand would also be possible, of course. As you say, there's then a duplication of arguments across various subcommands. But then again, it might even be preferred if you had a good reason to do so for the existing CLI. Note that then some of our changes would need to be reversed. In the end, I think it's as you and the users prefer :slightly_smiling_face:
Removing likely-solved
after today's discussion. Cause: lack of short form arguments.
This was brought up previously in a PR: https://github.com/pepkit/looper/pull/448#issuecomment-1921642806
But to reiterate here: pydantic-argparse
/pydantic2-argparse
does not support short-form arguments at this time and that is currently undesirable.
We will hold on releasing the next version of Looper (1.8.0) until we incorporate the short-form arguments.
It appears that clipstick
does allow short arguments and uses Pydantic 2:
https://sander76.github.io/clipstick/usage.html#keyword-arguments
I believe it was mentioned in a meeting that we decided not to use clipstick for some reason, but I cannot seem to track down my notes on why that was decided.
I'm sorry about the late realization that short arguments are not supported by pydantic-argparse
! We probably should have pointed that out more explicitly, rather than only in a comment on PR :slightly_frowning_face:
As for clipstick
, I think what I didn't like about it was that (quote from the documentation)
Next to that I wanted to try and build my own parser instead of using Argparse because… why not.
That doesn't exactly inspire confidence with me, but that's just opinionated on my side.
Yes, clipstick doesn't use argparse, which I considered a fatal downside, since we use argparse in all our other projects so we're familiar with it.
Ok, with the latest PR I've added the short arguments. Marking this as likely solved.
As described in #433, we would like to add a HTTP API for
looper
. The challenge here is to keep HTTP API and CLI in sync, and in a call, @nsheff, @zz1874 and I decided to address this by replacing theargparse
-based CLI command / argument definition with a definition based onpydantic
models: several libraries (for example,pydantic-argparse
,tyro
andclipstick
) allow to build CLI parsers with type checking frompydantic.BaseModel
definitions.So we agreed that our first task towards a HTTP API for
looper
is to redefine import CLI commands (looper {run,runp,check}
) viapydantic
models. Building an HTTP API that consumes data compliant with these models is then straightforward.For now, we assume we will be using
pydantic-argparse
, and, as agreed with @nsheff, will buildpydantic
models that are compatible with it. This issue and comments below track the progress of this task.