Closed s3alfisc closed 4 months ago
Hi Ben (@b-knight) - I have been playing around with scipy's
solver for linear systems and it's much faster than the sparse solver I have implemented by hand:
import numpy as np
from scipy.sparse import rand
from scipy.sparse.linalg import lsqr, spsolve
import time
m, n = 2_000_000, 500
A = rand(m, n, density=0.1, format='csr')
x_true = np.random.rand(n)
b = A @ x_true
tic = time.time()
result = lsqr(A, b)
toc = time.time()
print(f"Elapsed time: {toc - tic:.2f} seconds")
x_lsqr = result[0]
x_lsqr[0:5]
#Elapsed time: 1.35 seconds
#array([0.5335062 , 0.99837454, 0.69382102, 0.5373236 , 0.45653057])
tic = time.time()
alpha = spsolve(A.transpose() @ A, A.transpose() @ b)
toc = time.time()
print(f"Elapsed time: {toc - tic:.2f} seconds")
alpha[0:5]
# Elapsed time: 30.38 seconds
# array([0.53350756, 0.99837705, 0.69382127, 0.53732394, 0.45653019])
Hence to speed up the fixef
and the predict
method, we'd have to change this line https://github.com/py-econometrics/pyfixest/blob/405ba73787af6bac793afb5bdbb34d2e7ad6cd53/pyfixest/estimation/feols_.py#L1329 in .fixef()
to using the lsqr
solver.
Would you like to implement the change? Else I'm happy to do it myself =)
Best, Alex
@greenguy33 can I assign this to you? I think this should be a minimal change =)
@s3alfisc sure, I'll take a look asap (ideally later this week or early next)
Context
Currently, pyfixest can compute fixed effect estimates in the fixef() method by using a sparse solver.
A potentially faster alternative is to implement an iterative solver. See section 3 in the lfe paper.