vinci1it2000 / formulas

Excel formulas interpreter in Python.
https://formulas.readthedocs.io/
European Union Public License 1.1
342 stars 75 forks source link

How to define custom variable placeholder #104

Closed ccrvlh closed 8 months ago

ccrvlh commented 1 year ago

We are implementing a formula builder for the end user. One of the things we need to solve for is how to identify variables, get the correct data for each of the defined variables and then run the function.

One example: = {{ a222ec9ba7874 }} * {{ a222ec9bbf0751 }}

In this case: we use {{ and }} as delimiters for the variables (everything inside it is a variable). We will have a list of variables, get the data (we are using numpy arrays) from the database, and then make the calculation normally.

I understand that a custom delimiter definition may conflict with Excel's own token, but I couldn't understand how is a variable being identified at the moment. This variable a222ec9ba7874 is a hex representation of an integer (our identifier), but only the hex representation is not understood (Formula Error: Not a valid formula), we then added an a and it seems to fix it, but we couldn't quite understand the rules for the system to interpret a given token as a variable or not.

Besides having a clear rule/strategy to define a variable, it would be great to be able to customize the characters. PyParsing does that in a simple way, for example using variable_code = pp.QuotedString(quote_char="{{", end_quote_char="}}") . We could adapt, and for example use VAR_ a prefix for any variable. The flexibility of customizing the delimiters could help making things a bit easier on the frontend though, so would be a bonus.

Ideas on how to solve this or possible approaches? Thanks for sharing, amazing lib.


Edit: it seems that when I use VAR_XYZ123 it's identifying the Token as a Range, and as I'm using Numpy arrays as variables, it works perfectly. The regex pattern that decides whether something is a range or not is quite complex. I guess to be able to add custom chars it would be necessary to append a new regex. Not sure it will make sense, since it would deviate from the "Excel" purpose to a more generic formula engine.

ccrvlh commented 1 year ago

Would it make sense to open the Discussions feature to avoid Q&A style issues? @vinci1it2000 would you be ok with some docstrings and typing? https://github.com/vinci1it2000/formulas/compare/vinci1it2000:formulas:master...lowercase00:formulas:compile_helper

vinci1it2000 commented 8 months ago

With the next release you can use the following script.

>>> import regex
... from formulas.parser import Parser
... from formulas.tokens.operand import Range
>>> class MyOperand(Range):
...     _re = regex.compile(
...         r'(?P<ref>{{[^}]*}})',regex.IGNORECASE | regex.X | regex.DOTALL
...     )
>>> class MyParser(Parser):
...     filters = [MyOperand] + Parser.filters
>>> func = MyParser().ast(
...     '= {{ a222ec9ba7874 }} * {{ a222ec9bbf0751 }}'
... )[1].compile()
wiryonolau commented 7 months ago

With the next release you can use the following script.

>>> import regex
... from formulas.parser import Parser
... from formulas.tokens.operand import Range
>>> class MyOperand(Range):
...     _re = regex.compile(
...         r'(?P<ref>{{[^}]*}})',regex.IGNORECASE | regex.X | regex.DOTALL
...     )
>>> class MyParser(Parser):
...     filters = [MyOperand] + Parser.filters
>>> func = MyParser().ast(
...     '= {{ a222ec9ba7874 }} * {{ a222ec9bbf0751 }}'
... )[1].compile()

Hi is there an example on how to pass the a222ec9ba7874 and a222ec9bbf0751 variable ?