Allow sparse X, fixes #271

rhjohnstone commented 3 years ago

Allow sparse X when fitting and predicting - fixes #271.

An example using the new version:

from ngboost import NGBSurvival
import numpy as np
import scipy.sparse as sp
from time import perf_counter

def generate_X_T_E(n, d):
    X = np.random.choice(2, p=[0.9, 0.1], size=(n, d))
    T = np.random.rand(n)
    E = np.random.randint(2, size=n)
    return X, T, E

n_train = 10000
n_val = 1000
d = 3000

X, T, E = generate_X_T_E(n_train, d)
X_val, T_val, E_val = generate_X_T_E(n_val, d)

# Full/dense
ngb = NGBSurvival(n_estimators=101, learning_rate=0.1, verbose_eval=10, random_state=0)
t0 = perf_counter()
ngb.fit(X, T, E, X_val=X_val, T_val=T_val, E_val=E_val, early_stopping_rounds=50)
print(perf_counter() - t0, "s")

t0 = perf_counter()
y_dists = ngb.pred_dist(X_val)
print(perf_counter() - t0, "s")
print(y_dists.params["s"].mean())

# Sparse
ngb = NGBSurvival(n_estimators=101, learning_rate=0.1, verbose_eval=10, random_state=0)
t0 = perf_counter()
ngb.fit(sp.csr_matrix(X), T, E, X_val=sp.csr_matrix(X_val), T_val=T_val, E_val=E_val, early_stopping_rounds=50)
print(perf_counter() - t0, "s")

t0 = perf_counter()
y_dists = ngb.pred_dist(sp.csr_matrix(X_val))
print(perf_counter() - t0, "s")
print(y_dists.params["s"].mean())

which gives me a 5x speed increase (as in, it takes 20% of the time to run) when using sparse X, and gives the same predicted output to 15 decimal places.

ryan-wolbeck commented 2 years ago

@alejandroschuler this PR looks functionally good to me, do you have any concerns with this implementation?

alejandroschuler commented 2 years ago

@alejandroschuler this PR looks functionally good to me, do you have any concerns with this implementation?

Sorry for the delay, been busy with other projects. This is a great feature addition!

stanfordmlgroup / ngboost

Allow sparse X, fixes #271 #272