mozilla / active-data-recipes

A repository of various activedata queries and recipes
Mozilla Public License 2.0
8 stars 24 forks source link

Define recipe queries and arguments in a structured manner #40

Closed ahal closed 5 years ago

ahal commented 5 years ago

Currently a recipe is just a python file with a run function in it. Defining and handling arguments is done inside this run function via argparse, so it's hard to extract this information (e.g for the web app).

I'd like to invent a more structured and strict format for recipes. We could either move to a yml format (similar to how mozlint linters work), or else just stick with the python scripts, but expect certain global configuration variables to be present. I'm leaning towards the latter.

Some things that should be define more structured:

Basically, writing a new recipe should only require the logic to post-process the query data. We shouldn't need to manually do things like create parsers or invoke the queries.

kerlynNkep commented 5 years ago

Hi @ahal , I'm an outreachy applicant and this project looks really interesting and am looking forward to working on it. I am kindoff new to open source and I believe this will go a long way to improve my experience. I will start working on this issue.

ahal commented 5 years ago

Hi @kerlynNkep, I think this is an issue that might not be great for contributors, mostly because it will be a fairly major refactor and require a bunch of design work. On top of that I'm not really sure what the ideal solution is, so it will be hard for me to guide you in the right direction. Sorry for not making that clear up front (I added a new label to help make it a bit more obvious).

If you are confident you can work through it (and potentially start over a few times) you are welcome to attempt it, but I'd probably suggest working on something else instead. Issue #45 might be a good one.

Thanks for your interest!

kerlynNkep commented 5 years ago

Hi @ahal, thanks for your reply. Since am actually new to the codebase ill take your suggestion and work on the referred issued but ill be back for this issue after i must have gained experience and confidence with the code base.

Thanks

MadinaB commented 5 years ago

Hi, @ahal!

Hope you are doing well! I know that there is already one PR for this task, but I think solving it this way: https://github.com/mozilla/active-data-recipes/compare/master...MadinaB:issue40 will be very laconic and this can also be considered as one possible option.

The idea is in using docopt. I have found about this lib while I was trying tobuild the subparsers unconditionally as you told me here. (I did not forgot about this, but did not have enough time to come with satisfying solution yet. It made me pretty confident that unconditional subparsers fitting our needs can not be done via argparse..)

So the idea with docopt is that it returns dict of parsed arguments and the input is the arguments itself and one string telling how to parse args.

For concretely this commit with active_data query it uses this string:

argument_parser = """
    Usage:
      active_data [--from <from_date>] [--to <to_date>]

    Options:
      --from=<from_date>  Starting date to pull data from, defaults to a week ago. [default: today-week]
      --to=<to_date>  Ending date to pull data from, defaults to now. [default: eod]
""" 

This short command returns dict with parsed arguments. Done.

docopt(args, argv=arguments)

and is similar to what we do in 3 lines:

parser = RecipeParser()
args = parser.parse_args(args)
query_args = vars(args)

So in general the idea is:

  1. Provide each query with configuration string.
  2. Save configuration strings along with queries in .query file.
  3. Do argument parsing in run_query method in query.py file:
    • Simply read argument_parser and query from yaml in load_query:
      
      with open(os.path.join(QUERY_DIR, name + '.query')) as fh:
      for record in yaml.load_all(fh):
          yield record["argument_parser"], record["query"]
do in`run_query`:

context = docopt(argument_parser, argv=args)


and `query = jsone.render(query, context)`
will do all as expected further

---
I think this one could be a good option since similar [docopt](https://github.com/docopt) modules exist for many languages and passing this configuration script is self-descriptive and easily reproducible for other apss
ahal commented 5 years ago

This is an interesting solution! I have to say, I'm really impressed with the thought you put into this problem and the approach you settled on.

Though please be aware that the applicant who submitted the other PR and I have still been working on this problem out of band, so if you'd like to continue on this there's a chance your work won't end up being merged. I'm not trying to discourage you, but just wanted to let you know before you spend too much more time on it. We actually came to the same conclusion as you, that arguments should be defined in the queries somehow (rather than in the recipes).

Now back to your PR.

I think rather than defining a docopt string in the query files, I'd like to make things more structured. I.e, I want the possible arguments that a query can use to be well defined in an external file (maybe just query.py for now). Then, when a query uses a context value, we can first check that the value is allowed and then automatically add the related argument to the parser. I have used docopt before, and it is a really nice module. But one thing it isn't great at is composing arguments together (concatenating strings is messier than merging data structures). So I think for this case I'd lean towards sticking with argparse, even though its API is not great..

But your higher level approach is great. Arguments should absolutely be tied to the queries rather than the recipes. I think we'll still need the ability for recipes to specify their own arguments for the post-processing phase. Here are some edge cases to think about:

ahal commented 5 years ago

Also just re-iterating that I wasn't kidding when I labelled this as a difficult issue :). But I'm happy with the progress made here so far and the direction this is heading!