Closed teucer closed 1 year ago
Hi @teucer ! Actually this is behaving as expected. In Wilkinson formulae, -1 and +0 both have the effect of removing the intercept. Is there something I'm missing?
We want to be able to say model_matrix(rhs)
. But this would add back the intercept for 2nd and 3rd cases.
Maybe model_matrix should handle it:
There could be a difference between 2 and 3. Right now they are handled the same way.
Hmmmm... I'm a bit confused as to what you are trying to achieve here.
There was a bug in formulaic.model_matrix
related to ingesting pre-processed formula that is now fixed, and will go out in the next point release, but basically you should be able to do this:
>>> import pandas
>>> from formulaic import Formula, model_matrix
>>> f = Formula("y ~ x - 1")
>>> f.rhs
x
>>> model_matrix(f.rhs, pandas.DataFrame({"x": [1,2,3]}))
x
0 1
1 2
2 3
Why are you re-processing string representations of the right hand side?
The issue is that in the current version
>>> model_matrix(f.rhs, pandas.DataFrame({"x": [1,2,3]}))
# -> FormulaInvalidError
>>> model_matrix("x", pandas.DataFrame({"x": [1,2,3]}))
Intercept x
0 1.0 1
1 1.0 2
2 1.0 3
Given the bug, we were trying to do with the string.
The more fundamental questions are, should there be a difference:
Formula("~x")
and Formula("x")
f = Formula("y ~ x - 1")
, between model_matrix("x", df)
and model_matrix(f.rhs, df)
My opinion:
Hi again @teucer ,
The bug is now fixed, and I can roll out a new version for you if this is blocking you; otherwise I'll wait a few more days so it can also include some categorical encoding enhancements.
Your proposal is not unreasonable, but for consistency with expectations set by patsy
and R, I'm going to make the executive choice not to treat ~x
and x
differently.
However, formulaic does (under the hood) support suppression of the addition of the intercept. For example, you could do:
>>> from formulaic import Formula
>>> from formulaic.parser import FormulaParser
>>> Formula(FormulaParser().get_terms("x", include_intercept=False))
x
It might be sensible to just add the include_intercept
option in the Formula
constructor, if that would be helpful to you.
Having include_intercept
in the constructor would be really helpful.
Hi again!
As we move toward 1.0.0 I'm going back through all the issues and deciding things one way or another. Here, I'm deciding not to include this in the Formula constructor; however, since this issue was created it was moved to the parser constructor. So you can now do:
from formulaic import Formula
from formulaic.parser import DefaultFormulaParser
Formula("x", _parser=DefaultFormulaParser(include_intercept=False))
Due to modularity, and conflation with the Structured
nature of Formula
, this is no longer something I think would be a good idea.
Thanks again for sharing your thoughts and helping make Formulaic great!
I'll close this one out for now.
I think there is a discrepancy in the output of rhs:
rhs = "x+1"
rhs = "x"
rhs = "x"
I would have expected to get
rhs = "x-1"
orrhs = "x+0"
for the second and third case.