py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/
MIT License
179 stars 35 forks source link

fixef: Implement Iterative Least Squares Solver #469

Closed s3alfisc closed 4 months ago

s3alfisc commented 6 months ago

Context

Currently, pyfixest can compute fixed effect estimates in the fixef() method by using a sparse solver.

A potentially faster alternative is to implement an iterative solver. See section 3 in the lfe paper.

s3alfisc commented 4 months ago

Hi Ben (@b-knight) - I have been playing around with scipy's solver for linear systems and it's much faster than the sparse solver I have implemented by hand:

import numpy as np
from scipy.sparse import rand
from scipy.sparse.linalg import lsqr, spsolve
import time

m, n = 2_000_000, 500
A = rand(m, n, density=0.1, format='csr')
x_true = np.random.rand(n)
b = A @ x_true

tic = time.time()
result = lsqr(A, b)
toc = time.time()
print(f"Elapsed time: {toc - tic:.2f} seconds")
x_lsqr = result[0]
x_lsqr[0:5]
#Elapsed time: 1.35 seconds
#array([0.5335062 , 0.99837454, 0.69382102, 0.5373236 , 0.45653057])

tic = time.time()
alpha = spsolve(A.transpose() @ A, A.transpose() @ b)
toc = time.time()
print(f"Elapsed time: {toc - tic:.2f} seconds")
alpha[0:5]
# Elapsed time: 30.38 seconds
# array([0.53350756, 0.99837705, 0.69382127, 0.53732394, 0.45653019])

Hence to speed up the fixef and the predict method, we'd have to change this line https://github.com/py-econometrics/pyfixest/blob/405ba73787af6bac793afb5bdbb34d2e7ad6cd53/pyfixest/estimation/feols_.py#L1329 in .fixef() to using the lsqr solver.

Would you like to implement the change? Else I'm happy to do it myself =)

Best, Alex

s3alfisc commented 4 months ago

@greenguy33 can I assign this to you? I think this should be a minimal change =)

greenguy33 commented 4 months ago

@s3alfisc sure, I'll take a look asap (ideally later this week or early next)