Closed kylebutts closed 1 year ago
Hi @kylebutts , thanks so much for this PR. The speed gains look amazing. I have taken a first look at the code, and I think it looks excellent, and I will be more than happy to accept this PR once the sparse model matrix makes it into fixest.
I have a few comments, which I am listing here (but I will also comment in the PR). Some of these, I might just fix myself as I go through the code line by line - in case I do this, I will mark it as done (I hope that's ok with you?).
=
assignment (as in fixest) vs <-
assignment as in the rest of summclustarma::pinv
is around twice as fast as MASS::ginv
. For performance reasons, it might make sense to switch, at the cost of adding RcppArmadillo
as a (heavy) dependency. But all the Rcpp packages are extremely well maintained (practically one epsilon below base R), so I would be fine to add this dependencyOther general comments I had:
sparse = TRUE
as default for vcov_CR3J
. If the summclust
function has no sparse
argument itself, users will never have the option to produce results for the leverage statistics that differ from stata (at the cost of small drops in performance). We can even consider to return slightly fewer results for the vcov_CR3J
class when sparse = TRUE
, as practically all results are only exported for computing leverage stats. E.g. we could add a second class attribute as
class(res) <- c("vcov_CR3J", "sparse")
and then not return the beta_jack's etc for this class.
I'll play around with the code a little bit more and might add to this list. Anyways, thanks a lot - this is a really exciting PR! =)
Best, Alex
MPinv
from VCA package produces the same results as MASS::ginv
. It passes all the tests and should work now. However, still using sparse produces different results on summclust.
The reason I pushed onto this PR is that vcov_CR3J.fixest
now works fully without densifying (may be a made up word) matrices
Adds the option
sparse
tovcov_CR3J
. This is substantially faster (see reprex), but (1) requiresfixest::sparse_model_matrix
to be merged (https://github.com/lrberge/fixest/pull/418) and (2) results in differentbeta_jack
. This is fine for the vcov, but makessummclust
not match Stata. All tests passed and I added a set of tests to make sure sparse/dense versions match each other.Added a little speedup to partial-leverage. In R, you basically never want to use
solve(A) %*% b
. It's an order of magnitude slower thanMatrix::solve(A, b)
.Made two modifications to existing tests. For
CR3J_vs_HC3
, I droppednlswork
to 100 rows. It's much faster and effectively the same test. For the plot and the summary, I added aquietly
function so it doesn't print out a bunch of results and open the plot pane when testing.vcov_CR3J
sparse example:Created on 2023-06-10 with reprex v2.0.2
solve(A) %*% b
vs.solve(A, b)
Created on 2023-06-10 with reprex v2.0.2