gramener / gramex-nlg

Natural Language Generation for Gramex applications.
Other
24 stars 24 forks source link

Intermediate representation of FormHandler arguments #18

Closed jaidevd closed 4 years ago

jaidevd commented 4 years ago

Do not tokenize / process / search formhandler arguments as they are. Leading hyphens cause the tokens to get corrupted. For example:

>>> df = pd.read_csv('actors.csv')
>>> fh_args = {'_sort': ['-rating']}
>>> text = nlp('James Stewart is the actor with the highest rating.')
>>> n = templatize(text, fh_args, df)
>>> n.render(df)
b'James Stewart is the actor with the highest -rating.'

Instead, preprocess FormHandler arguments with gramex.data._filter_{sort, select, groupby}_columns to get a cleaner representation which leaves the token untouched. Use this representation in the template.

jaidevd commented 4 years ago

This should also handle any other FormHandler DSL artifacts, like pipes:

fh_args = {'_by': ['species'], '_c': ['sepal_width|avg'], '_sort': ['sepal_width|avg']}