pydata / patsy

Describing statistical models in Python using symbolic formulas
Other
954 stars 104 forks source link

Patsy Error: TypeError: 'Series' object is not callable | Cannot Set Reference Level with DataFrame #195

Closed jacob-r-anderson closed 1 year ago

jacob-r-anderson commented 1 year ago

I have a data frame with a column "Days" and another column "Age_Bracket". Days is type integer. And Age_Bracket is strings. I want to do a linear regression where I set the "Age_Bracket" category "Thirty" to be the reference.

When I run this:

res = smf.ols(formula="Days ~ C(Age_Bracket)",data=df1).fit()
res.summary()

It works totally fine. But if I try to indicate a reference, I get the error below:

res = smf.ols(formula="Days ~ C(Age_Bracket(reference='Thirty'))",data=df1).fit()
res.summary()

PatsyError: Error evaluating factor: TypeError: 'Series' object is not callable
    Days ~ C(Age_Bracket(reference='Thirty'))

I have also tried a syntax that more closely matches the documentation (https://patsy.readthedocs.io/en/latest/API-reference.html#patsy.Treatment), but this gives the same error:

res = smf.ols(formula="Days ~ C(Age_Bracket('Thirty'))",data=df1).fit()
res.summary()

PatsyError: Error evaluating factor: TypeError: 'Series' object is not callable
    Days ~ C(Age_Bracket('Thirty'))
                 ^^^^^^^^^^^^^^^^^^^^^^^^
tomicapretto commented 1 year ago

You are trying to make a call using Age_Bracket but it's a Series, which cannot be callable. The syntax you want is

"C(Age_Bracket, Treatment('Thirty'))"

jacob-r-anderson commented 1 year ago

That worked immediately - thanks for the quick update and response. I personally found the documentation a bit confusing. But that may be more revealing about myself. I think some users, like myself, will think about using Patsy (at least in the context of stats modules.formula.api) as using DataFrames with different columns. And so it was not clear when reading 'dmatrix("C(a, Treatment(1))", balanced(a=3))' what to consider as a DataFrame vs. a column vs. a special term (like Treatment). For example, I thought this may indicate two columns "a" and "Treatment" where (1) in the "Treatment" variable/column is set as the reference.

tomicapretto commented 1 year ago

By the way, Patsy is not being developed anymore. I recommend you to switch to formulaic