jmboehm / GLFixedEffectModels.jl

Fast estimation of generalized linear models with high dimensional categorical variables in Julia
Other
33 stars 6 forks source link

`fe(m)` seems broken compared to FixefEffectModels #59

Closed pdeffebach closed 2 months ago

pdeffebach commented 3 months ago

fe(m) doesn't return the augmentdf when I think it is supposed to. Here is an MWE

julia> df = DataFrame(a = rand(1:10, 200), x = rand(200), y = rand(200));

julia> m = reg(df, @formula(y ~ x + fe(a)), save = :fe);

julia> fe(m)[1:5, :]
5×1 DataFrame
 Row │ fe_a     
     │ Float64? 
─────┼──────────
   1 │ 0.47084
   2 │ 0.508088
   3 │ 0.446838
   4 │ 0.508088
   5 │ 0.546581

julia> m_nlreg = nlreg(df, @formula(y ~ x + fe(a)), Poisson(), LogLink(), save = [:fe]);

julia> fe(m_nlreg)
0×0 DataFrame

julia> m_nlreg.augmentdf[1:5, :]
5×1 DataFrame
 Row │ fe_a      
     │ Float64   
─────┼───────────
   1 │ -0.749437
   2 │ -0.679168
   3 │ -0.797734
   4 │ -0.679168
   5 │ -0.609455
pdeffebach commented 3 months ago

The bug is super annoying. My augmentdf has only one column. We index

x.augmentdf[!, 2:size(x.augmentdf, 2)]

and 2:1 is an empty range.

FixedEffectModels.jl, by contrast, has the index of the fixed effect as the first column in their fe DataFrame object.

IMO, best course of action is to add the index to augmentdf

pdeffebach commented 3 months ago

The bug is super annoying. My augmentdf has only one column. We index

x.augmentdf[!, 2:size(x.augmentdf, 2)]

and 2:1 is an empty range.

FixedEffectModels.jl, by contrast, has the index of the fixed effect as the first column in their fe DataFrame object.

IMO, best course of action is to add the index to augmentdf

jmboehm commented 2 months ago

Should be fixed by #60 , let me know if not

pdeffebach commented 2 months ago

Sure, but why not add the fixed effect identifiers? It's always scary using hcat rather than join to attach the fixed effects to the data frame after estimation.