py-econometrics / pyfixest

Fast High-Dimensional Fixed Effects Regression in Python following fixest-syntax
https://py-econometrics.github.io/pyfixest/pyfixest.html
MIT License
118 stars 27 forks source link

ritest: Optimize the numba code in `ritest` #461

Open s3alfisc opened 1 month ago

s3alfisc commented 1 month ago

Context

A large part of the ritest module is written in numba. As I am not a great numba coder, there is likely quite a bit of optimization potential.

To do

Ask @styfenschaer to see if he has time and interest to take a closer look =)

styfenschaer commented 1 month ago

Hi @s3alfisc

Surely I can take a look at it. Do you have any benchmark cases that can be used for profiling?

s3alfisc commented 1 month ago

Hi @styfenschaer , that's super cool, thank you so much! I am writing this in a hurry, so please apologize the brief message. The function to optimize is _run_ri - below you can find an example how to run it that you could use for profiling =)

%load_ext autoreload
%autoreload 2

import numpy as np
from pyfixest.estimation.ritest import _run_ri

N = 1000
k = 4

Y_arr = np.random.normal(size=N).reshape((-1,1))
X_arr = np.random.normal(size=N*k).reshape((N, k))
resampvar_arr = np.random.normal(size=N).reshape((-1,1))
fval_arr = np.random.choice(range(10), N, True).reshape((-1,1)).astype(int)
weights = np.ones((N,1))

rng = np.random.default_rng(1234)

_run_ri(
    reps = 1000, # int
    rng=rng,     # rng 
    resampvar_arr=resampvar_arr,    # two-dimensional array
    fval=fval_arr,                  # two-dimensional array of ints          
    Y_demean=Y_arr.flatten(),       # one-dimensional array
    X_demean2= X_arr,               # two-dimensional array
    weights=weights.flatten()       # one-dimensional array
)

res = _run_ri(
    reps = 1000, 
    rng=rng,
    resampvar_arr=resampvar_arr,
    fval=None,
    Y_demean=Y_arr.flatten(), 
    X_demean2= X_arr,
    weights=weights.flatten()
)

res[0:5]