gramener / gramex-nlg

Natural Language Generation for Gramex applications.
Other
24 stars 24 forks source link

Variable template insertion should rely on spacy token indices, not str.replace #21

Open jaidevd opened 4 years ago

jaidevd commented 4 years ago

From the README example,

the auto-gen template turns out to be:

{% set fh_args = {"_by": ["species"], "_c": ["sepal_width|avg"], "_sort": ["sepal_width|avg"]}  %}
{% set df = U.gfilter(orgdf, fh_args.copy()) %}
{% set fh_args = U.sanitize_fh_args(fh_args, orgdf) %}
{# Do not edit above this line. #}
The {{ df["{{ fh_args['_by'][0] }}"].iloc[0] }} {{ fh_args['_by'][0] }} has the least average sepal_width.

The "virginica" token turns out to be a nested template - the correct value is {{ df["species"].iloc[0] }}. But "species" itself is another variable (right the next word) with value {{ fh_args['_by'][0] }}, and therefore gets re-templatized. This is because templates are added in the source text with str.replace here.

They should instead be replaced by changing spacy tokens and forming new spacy docs.