jmboehm / GLFixedEffectModels.jl

Fast estimation of generalized linear models with high dimensional categorical variables in Julia
33 stars 6 forks source link

Error with singleton observations #63

Open pdeffebach opened 2 months ago

pdeffebach commented 2 months ago

This error occurs when there are singleton observations. It looks like GLFixedEffectModels.jl isn't keeping track of the number of observations well enough, and some indexing persists even after dropping singleton observations. Here is an MWE

julia> df = DataFrame(y = rand(100), x = rand(100), g = rand(1:10, 100));

julia> push!(df, (y = 0.5, x = 0.5, g = 11));

julia> m = nlreg(df, @formula(y ~ x + fe(g)), Poisson(), LogLink(), save = [:fe])
[ Info: 1 observations detected as singletons. Dropping them ...
ERROR: BoundsError: attempt to access 100×1 Matrix{Float64} at index [101-element BitVector, 1:1]

The bug only occurs when we have save = [:fe]. So it has to do with creating the augmentdf data frame

julia> m = nlreg(df, @formula(y ~ x + fe(g)), Poisson(), LogLink())
[ Info: 1 observations detected as singletons. Dropping them ...
             Generalized Linear Fixed Effect Model             
Distribution:        "Poisson"   Link:                "LogLink"
Number of obs:             100   Degrees of freedom:         11
Deviance:               19.013   Pseudo-R2:                 NaN
Pseudo-Adj. R2:            NaN   Iterations:                  5
Converged:                true   
      Estimate Std.Error   t value Pr(>|t|) Lower 95% Upper 95%
x    -0.343138  0.516953 -0.663771    0.509  -1.35635   0.67007