Closed samrudh closed 4 years ago
Thanks for this, @samrudh
I understand the motivation behind this functionality, but the implementation might have to be done more carefully. For example, using the 'str.replace' method for changing a token into a "variable" is can't be done. Consider the following example.
sentence = 'The value of Pi up to 3 decimal places is 3.142.'
# which _ideally_ gets templatized as follows:
template = """
{% from math import pi %}
The value of Pi up to 3 decimal places is {{ round(pi, 3) }}
"""
Now we want to convert the token 3
into a variable which can be controlled from the outside. In order to do this, I cannot do template.replace
because the token 3
occurs in more than one place!
However, we can do this with spacy. In a spacy doc, every token has a unique ID, regardless of how often the text of that token appears in the document. Therefore, its better to attach a variable template to a spacy token instead of a Python string.
In general, we need a mechanism to attach a template to a spacy document. The nlg.js library solves this problem differently, since there you can select a piece of text in the UI and add a template formula for it - so there is no ambiguity about which substring the template formula replaces. Let me add a similar interface here in the Python module, and then we can use this. Please keep this PR open and I'll update you.
Makes sense.. so how are thinking about the interface? Something like this?
def add_manual_template(spacy_doc_index, manual_template)
Hi @samrudh
Here's the interface from our call:
text = "something"
fh_args = {///}
df
template = templatize(text, df, fh_args)
from nlg import Template
from spacy import load
nlp = load('/')
doc = nlp(text)
tmpl = template.templatize()
doc = nlp('Value of pi up to 3 decimals as 3.412')
template = Template(doc, df, fh_args)
template.replace(5, 'n_dec')
template.replace(len(doc) -1, 'round(pi, n_dec)')
template.set_variable_value('n_dec', 3)
template.templatize()
'''
{% set n_dec = 3 %}
Value of pi to {{ n_dec }} decimals is {{ round(pi, n_dec) }}
'''
template.render(df=df, n_dec=3)
Value of pi to 3 decimals is 3.412
I'll create an initial stub for the Template class, and we can work on populating the logic then.
@samrudh Can you please add me as collaborator to your fork? That way I can push some commits to this PR and we can continue from there.
For the below two issues:
https://github.com/gramener/gramex-nlg/issues/7 https://github.com/gramener/gramex-nlg/issues/8