scikit-learn-contrib / skglm

Fast and modular sklearn replacement for generalized linear models
http://contrib.scikit-learn.org/skglm
BSD 3-Clause "New" or "Revised" License
158 stars 32 forks source link

ENH - Numba compilation at import instead of execution time #148

Closed Badr-MOUFAD closed 5 months ago

Badr-MOUFAD commented 1 year ago

I discovered that we can pre-compile numba functions without having to run them by specifying the signature of the function in @njit

import time
import numpy as np
from numba import njit

@njit("f8(f8[:])")
def compute_sum(arr):
    sum = 0.
    for element in arr:
        sum += element
    return sum

arr = np.random.randn(10_000)

start = time.perf_counter()
# Runned without overhead
compute_sum(arr)  
end = time.perf_counter()

print("total elapsed time:", end - start)

By doing so, we transfer the entire compilation overhead to import time, hence, releasing our ourself from the first run to cache numba compilation (as done in benchmarks).

This is to be considered yet requires attention as we have many functions

The advantages are clear for small examples, I tried it also for a quite big code. However, I don't have much visibility on the impact of that on the whole package.


Also related to https://github.com/scikit-learn-contrib/skglm/issues/106

mathurinm commented 5 months ago

I am -0.5 on this: it requires more code maintenance, it may compile too much stuff, and it does not reduce the time taken by compilation in the script.

The fact that it makes benchmarking easier is not enough IMO (we try to target users now, more than optimization researchers).

@Badr-MOUFAD @QB3 I'll close this, reopen if you feel againt it!