Closed s3alfisc closed 8 months ago
Thought about this for a bit and concluded it is likely intended behavior =)
Hi @s3alfisc ,
Just wanted to clarify a few things here:
model_matrix("a ~ C(b)", df)
and Formula("a ~ C(b)").get_model_matrix(df, context=locals())
will give identical output (unless you are using globals too).Formula(nested="a ~ C(b)").get_model_matrix(df, context=locals())
will have a nested model matrix at .nested
which does not have the intercept. This is because in most cases where this is used, it is unlikely you want duplicate intercepts. Note that multipart formula like Formula("y ~ x1 | x2 | x3")
do not have this behaviour (the rhs formula will all have intercepts)._parser
and _nested_parser
arguments to Formula; e.g. Formula(nested="a ~ C(b)", _nested_parser=DefaultFormulaParser(include_intercept=True)
. You can read more in the Formula
docstring.Formula(nested=Formula("y~x"))
Hope that helps.
Hi @matthewwardrop, I am wondering if the following behavior is a bug or not:
model_matrix()
andFormula.get_model_matrix()
handle reference levels for categoricals differently:model_matrix
includes an intercept and drops a reference level by default, whileget_model_matrix()
does not.If intended behavior, is it possible to mimic
model_matrix()
behavior forFormula.get_model_matrix()
?