matthewwardrop / formulaic

A high-performance implementation of Wilkinson formulas for Python.
MIT License
347 stars 25 forks source link

fix: allow using column names without quotes if parentheses and brackets are balanced. #214

Closed matthewwardrop closed 4 days ago

matthewwardrop commented 4 days ago

Currently formulaic's tokenizer does not accept square brackets in unquote terms, but does accept parentheses (updating the token to a python term). This patch makes square brackets be interpreted the same as parentheses, allowing unquoted "indexing" of names, and upgrading them to Python tokens. Since this Python does not have to be syntactically valid in some contexts, e.g. in linear constraints, this has the desired behaviour of being able to use column names as the variables names without having to quote the columns. In formulae, this will allow indexing of variables for use as columns, which is also a nice boon.

Quoting will still be required if the parentheses or square brackets are imbalanced. for example: a[( will not pass tokenization; but a[(a)b] will; even if it is not syntatically valid Python. If this code is actually run, a SyntaxError will be raised at that point.

closes: #211