matthewwardrop / formulaic

A high-performance implementation of Wilkinson formulas for Python.
MIT License
313 stars 21 forks source link

`model_spec.transform_state` bugged when formula is not correctly written #165

Closed arturodcv closed 6 months ago

arturodcv commented 7 months ago

In case the given formula is not correctly written i.e., with extra or forgotten spaces then the transform_state dictionary keys does not match and the dictionary results in {}. Here is an example:

`import numpy as np import pandas as pd from formulaic import Formula

df = pd.DataFrame({ 'y': np.linspace(1,5,10), 'x': np.linspace(0,1,10), })

# Formula well written y, X = Formula('y ~ bs(x, df=4) ').get_model_matrix(df) print(X.model_spec.dict["transform_state"]) `

results in {'bs(x, df=4)': {'lower_bound': 0.0, 'upper_bound': 1.0, 'knots': [0.0, 0.0, 0.0, 0.0, 0.5, 1.0, 1.0, 1.0, 1.0]}} while

`# Formula not well written (note the whitespaces)

y, X = Formula('y ~ bs( x, df = 4) ').get_model_matrix(df) print(X.model_spec.dict["transform_state"]) `

results in {}

I am using Formulaic version 0.6.6

matthewwardrop commented 7 months ago

Thanks for reporting @arturodcv ! This is indeed a subtle bug that I likely wouldn't have caught. Will get it straighted out soon.