pachadotdev / capybara

tldr; If you have a 2-4GB dataset and you need to estimate a (generalized) linear model with a large number of fixed effects, this package is for you.
http://pacha.dev/capybara/
Apache License 2.0
13 stars 0 forks source link

`bias_corr` error (nonconformable matrices) #8

Closed grantmcdermott closed 1 month ago

grantmcdermott commented 1 month ago

Sorry, I don't really have time to troubleshoot, but I just ran into this unexpected error. MWE below is slightly adapted from the alpaca tutorial (here: estimating a logit model rather than probit).

library(capybara)
data(psid, package = "bife")

lmod = feglm(
  LFP ~ KID1 + KID2 + KID3 + log(INCH) | ID + TIME,
  data   = psid,
  family = binomial()
  )
summary(lmod)
#> Formula: LFP ~ KID1 + KID2 + KID3 + log(INCH) | ID + TIME
#> 
#> Family: Binomial
#> 
#> Estimates:
#> 
#> |           | Estimate | Std. Error | z value  | Pr(>|z|)   |
#> |-----------|----------|------------|----------|------------|
#> | KID1      |  -1.1743 |     0.0984 | -11.9392 | 0.0000 *** |
#> | KID2      |  -0.5913 |     0.0862 |  -6.8578 | 0.0000 *** |
#> | KID3      |  -0.0157 |     0.0608 |  -0.2578 | 0.7966     |
#> | log(INCH) |  -0.4046 |     0.0943 |  -4.2892 | 0.0000 *** |
#> 
#> Significance codes: *** 99.9%; ** 99%; * 95%; . 90%
#> 
#> Number of observations: Full 13149; Missing 0; Perfect classification 7173 
#> 
#> Number of Fisher Scoring iterations: 5

bias_corr(lmod)
#> Error: element-wise multiplication: incompatible matrix dimensions: 5976x1 and 13149x1

To confirm, the equivalent code with alpaca runs.

library(alpaca)
data(psid, package = "bife")

lmod = feglm(
  LFP ~ KID1 + KID2 + KID3 + log(INCH) | ID + TIME,
  data   = psid,
  family = binomial()
  )
summary(lmod)
#> binomial - logit link
#> 
#> LFP ~ KID1 + KID2 + KID3 + log(INCH) | ID + TIME
#> 
#> Estimates:
#>           Estimate Std. error z value Pr(> |z|)    
#> KID1      -1.17434    0.09836 -11.939   < 2e-16 ***
#> KID2      -0.59134    0.08623  -6.858  6.99e-12 ***
#> KID3      -0.01566    0.06076  -0.258     0.797    
#> log(INCH) -0.40458    0.09433  -4.289  1.79e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> residual deviance= 6067.49,
#> null deviance= 8152.05,
#> n= 5976, l= [664, 9]
#> 
#> ( 7173 observation(s) deleted due to perfect classification )
#> 
#> Number of Fisher Scoring Iterations: 5

biasCorr(lmod)
#> binomial - logit link, l= [664, 9]
#> 
#>      KID1      KID2      KID3 log(INCH) 
#>  -1.02689  -0.51776  -0.01344  -0.35653

Created on 2024-09-19 with reprex v2.1.1

pachadotdev commented 1 month ago

@grantmcdermott

thx!!

fixed in the most recent commit

I just checked after subtracting the perfectly classified obs

devtools::load_all()
Loading required package: utils
Tracing function "install.packages" in package "utils"

Loading required package: usethis
# install.packages("bife")
data(psid, package = "bife")

lmod = feglm(
  LFP ~ KID1 + KID2 + KID3 + log(INCH) | ID + TIME,
  data   = psid,
  family = binomial()
  )

summary(lmod)

bias_corr(lmod)
> devtools::load_all()
ℹ Loading capybara
> 
> # install.packages("bife")
> data(psid, package = "bife")
> 
> lmod = feglm(
+   LFP ~ KID1 + KID2 + KID3 + log(INCH) | ID + TIME,
+   data   = psid,
+   family = binomial()
+   )
> 
> summary(lmod)
Formula: LFP ~ KID1 + KID2 + KID3 + log(INCH) | ID + TIME

Family: Binomial

Estimates:

|           | Estimate | Std. Error | z value  | Pr(>|z|)   |
|-----------|----------|------------|----------|------------|
| KID1      |  -1.1743 |     0.0984 | -11.9392 | 0.0000 *** |
| KID2      |  -0.5913 |     0.0862 |  -6.8578 | 0.0000 *** |
| KID3      |  -0.0157 |     0.0608 |  -0.2578 | 0.7966     |
| log(INCH) |  -0.4046 |     0.0943 |  -4.2892 | 0.0000 *** |

Significance codes: *** 99.9%; ** 99%; * 95%; . 90%

Number of observations: Full 13149; Missing 0; Perfect classification 7173 

Number of Fisher Scoring iterations: 5 
> 
> bias_corr(lmod)
binomial - logit link, l= [664, 9]

     KID1      KID2      KID3 log(INCH) 
 -1.32179  -0.66493  -0.01789  -0.45263
grantmcdermott commented 1 month ago

Great, thanks for the quick turnaround!

grantmcdermott commented 1 month ago

Should we expect bias_corr to produce the same result as alpaca::biasCorr?

pachadotdev commented 1 month ago

Should we expect bias_corr to produce the same result as alpaca::biasCorr?

in principle, yes

however,

Should we expect bias_corr to produce the same result as alpaca::biasCorr?

not really, I moved the GLM computation to C++ including the link functions, so the convergence is slightly different

the magnitudes should have the same sign