Closed s3alfisc closed 8 months ago
In R, one could reach the decide outcome by wrapping the factor / categorical into as.numeric()
:
data(mtcars)
mtcars[, "hp"] = as.factor(mtcars[, "hp"])
sapply(mtcars, class)
# mpg cyl disp hp drat wt
# "numeric" "numeric" "numeric" "factor" "numeric" "numeric"
# qsec vs am gear carb
# "numeric" "numeric" "numeric" "numeric" "numeric"
model.matrix(mpg ~ hp, mtcars) |> dim() # [1] 32 22
model.matrix(mpg ~ as.numeric(hp), mtcars) |> dim() # [1] 32 2
Ah, of course it works with contexts:
from formulaic import Formula
import pyfixest as pf
import pandas as pd
def to_numeric(x):
return pd.to_numeric(x)
data = pf.get_data()
data["f1"] = data["f1"].astype("category")
Formula(lhs = "Y", fixef = "to_numeric(f1) + f2").get_model_matrix(data, context = {"to_numeric": to_numeric})
.fixef:
to_numeric(f1) f2
1 6.0 21.0
3 1.0 10.0
4 19.0 20.0
5 13.0 3.0
6 2.0 16.0
.. ... ...
995 14.0 23.0
996 19.0 17.0
997 3.0 5.0
998 18.0 20.0
999 4.0 19.0
Cool! =)
A bit late, and a bit data-specific, but you could also use: Formula(lhs = "Y", fixef = "f1.to_numeric() + f2")
.
I wonder whether we should add something like O()
, N()
, and raw()
for explicit ordinal, numerical, and passthrough encodings respectively?
Hi @matthewwardrop, I just realized all the power given to me by the
Formula
class - it's really super neat and will save me quite a few lines of code! For my use case, I'd like to forceFormula.get_model_matrix()
to not expand categorical variables for a particular input formula "fixef". Is it possible to achieve this easily?Here is a quick example:
If
f1
is of typepd.Categorical
,.get_model_matrix()
applies the standard one-hot encoding. But I'd like to returnfixef
as a non-encoded data frame, e.g. I'd like the output to look as if f1 and f2 were e.g. integers:Is this possible to achieve?
Best, Alex