matthewwardrop / formulaic

A high-performance implementation of Wilkinson formulas for Python.
MIT License
347 stars 25 forks source link

Intercept term breaks when RHS formula begins with a parentheses #129

Closed slaskey23 closed 1 year ago

slaskey23 commented 1 year ago

If you specify a formula where the intercept is implied and the RHS begins with a parentheses, then fm.model_matrix() will break with formulaic.errors.FormulaSyntaxError: Missing operator between 1 and x1.

Code to reproduce:

import formulaic as fm
import pandas as pd

df = pd.DataFrame(
    {
        'x1': [1, 2, 3],
        'x2': [1, 1, 1],
        'y': [5, 4, 3],
    }
)

# works
f1 = 'y ~ x1 + x2'
fm.model_matrix(f1, df)

# works
f2 = 'y ~ 1 + x1 + x2'
fm.model_matrix(f2, df)

# works
f3 = 'y ~ 1 + (x1 + x2)'
fm.model_matrix(f3, df)

# breaks
f4 = 'y ~ (x1 + x2)'
fm.model_matrix(f4, df)
matthewwardrop commented 1 year ago

Hi @slaskey23 ! Thanks for reporting. This has been fixed on the main branch, but I haven't put out a release for a while. It's been waiting for some work to land, and then... well... life. I'll try to put out a release soon.

Let me know if you find anything else!

slaskey23 commented 1 year ago

thanks for the quick turnaround! it's an easy workaround in my code for now, so no worries.