Closed kabirkhan closed 4 years ago
Looks like sacremoses already uses Click as a dependency so using Typer would be ideal as it's only dependency is Click and it provides a much nicer interface than Click using Python 3.6+ Type Hints
Also, since Typer is using Python 3.6+ Type Hints this would require dropping Python 3.5. While that's not ideal it is the general trend for a lot projects these days as the benefits of using type hints and proper dict ordering are really high when it comes to testing and general usability.
More discussion here about Python 3.5 and whether it's worth supporting since only about a little over 1% of installs come from Python 3.5: https://github.com/huggingface/transformers/issues/2608
So I don't think the issue is argument parsing, that code is pretty much declarative and will look the same in all implementations.
We have been experimenting with using lightning to simplify the interface for these models. Do you find the code here to be more readable?
https://github.com/huggingface/transformers/blob/master/examples/ner/run_pl_ner.py
I actually think we're talking about 2 different problems:
I've implemented a couple draft PRs for using Typer.
Typer reduces the amount of code significantly in both cases while also making the functions easier to read and understand.
There's a larger discussion about using CLI Arguments vs Options that should be had as well. Most of the examples are overly verbose to run using the existing CLI options.
For instance, in the run_generation.py example I migrated (PR 1 above) there are only 2 required options (model_type and model_name_or_path). I made these Typer Options to not break convention for now but they should both be arguments.
That way, instead of writing:
python examples/run_generation.py --model_type gpt2 --model_name_or_path distilgpt2
the user can write something like:
python examples/run_generation.py gpt2 distilgpt2
And the docstring for the function can document that these arguments are required. Typer automatically uses the docstring in the help.
So here's the automatic help docs for run_generation
python examples/run_generation.py --help
Usage: run_generation.py [OPTIONS] MODEL_TYPE MODEL_NAME_OR_PATH
Generate text based on a prompt using one of [gpt2, ctrl, openai-gpt,
xlnet, transfo-xl, xlm] as the model_type and a a supported model name or
path for that model_type
e.g.
$ python examples/run_generation.py gpt2 distilgpt2
Options:
--prompt TEXT
--length INTEGER
--stop-token TEXT Token at which text generation is stopped
--temperature FLOAT temperature of 1.0 has no effect, lower tend
toward greedy sampling
--repetition-penalty FLOAT primarily useful for CTRL model; in that
case, use 1.2
--k INTEGER
--p FLOAT
--padding-text TEXT Padding text for Transfo-XL and XLNet.
--xlm-language TEXT Optional language when used with the XLM
model
--seed INTEGER random seed for initialization
--no-cuda Don't use CUDA and run on CPU.
--num-return-sequences INTEGER The number of samples to generate.
--help Show this message and exit.
The work on transformers-cli (2) seems interesting as there are complex types there. I am pretty unconvinced on (1). The code reduction is mostly aesthetic, I don't see any really complexity wins. Given that I'm apt to stick with argparse as it is standard. (The argument/options thing could also be done in argparse. )
Thanks for the feedback
Actually I think it's more standard to use a CLI parsing dependency over argparse these days. Not a huge deal and it's not my library but I've just heard the same feedback about argparse in the examples from a few colleagues around Microsoft which is why I decided to propose the change.
If you do have some time to give a quick review on (2) that would be awesome. I think the changes there offer a lot of clarity particularly with using the Enum types.
@julien-c any thoughts on this? I don't think we want another dependency, but @kabirkhan did put a lot of work into restructuring CLI https://github.com/huggingface/transformers/pull/2974
My two cents, or maybe just one cent: I have always been torn with this, the same with plac. It feels more verbose than argparse but also, it doesn't.
Here, in this case, we currently already have the register_subcommand
s so I think Typer actually makes sense. Looking at the code, it does greatly reduce redundancy (except for the app.command() calls). However, it is a lot less known (I think) and I still feel it is less intuitive than good ol' argparse. So if this were a vote, I'd vote aye, keeping in mind that it might be a "learning curve". On top of that, all future CLI scripts should also use this library to streamline the experience and development. As a bonus, dropping 3.5 support is an excellent side-effect in my opinion (Typing, f-strings, ordered dicts), but one might want to check how many users are on 3.5.
@BramVanroy thanks for the input. Yeah the app.command() calls are annoying, fixed that with a for loop just now. I hear your point on the "learning curve" but Typer is truly very easy to learn and really well documented. It's built by tiangolo (same guy who made FastAPI). Also the current CLI already requires python 3.6
@BramVanroy less than 1% of pip installs are on Python 3.5 according to https://pypistats.org/packages/transformers β we will probably drop it in the next couple of weeks or months
@kabirkhan Thanks for the work you've put into those PRs! This is interesting, and analogous to ideas we've been discussing internally and here in other issues. More generally we are in the process of thinking about rebuilding our training example scripts in a more scalable/re-usable way. We'll link our first thoughts from here.
I'll close this as we now have a very simple (100 lines of code) built-in argument parser named HfArgumentParser.
Example scripts are now way cleaner after #3800.
Will also close associated PR #2974.
Would love to get your feedback @kabirkhan, thanks for prompting our reflection.
π Feature request
I'm pretty new to transformers and I'm finding the examples a bit hard to read. Mainly I think this is due to argparse being so verbose. Is there any interest in migrating to something like Plac or even better Typer? I think it would make all of the examples a lot easier to grasp right away and friendlier to newer users.
Motivation
It will make the examples easier to read and a lot shorter. Currently, in an example like https://github.com/huggingface/transformers/blob/master/examples/run_language_modeling.py, just parsing CLI arguments takes 140 lines of the 799 lines of code (17.5 % of the total code).
Then the args object is passed around to all of the other functions and that becomes hard to deal with when first looking at the example and not having an understanding of exactly what values each function needs.
Your contribution
I'm happy to contribute the change across the transformers-cli and all the examples over time. Want to better understand the project requirements around adding a new dependency before submitting any PR.