Open pdimitrov-nv opened 3 years ago
I wonder if this might be an underlying difficulty in finding a non-degenerate solution given the matrix shape of (1345, 943).
As an example:
import cuml
import time
REPEATS = 5
for n in (100, 100000):
for i in range(REPEATS):
X, y = cuml.datasets.make_regression(n_samples=n, n_features=1000)
model = cuml.linear_model.ElasticNet(alpha=1)
tick = time.time()
model.fit(X, y)
tock = time.time()
print(f"{n} x 1000 matrix,: {tock-tick} seconds to fit")
100 x 1000 matrix,: 1.3444466590881348 seconds to fit
100 x 1000 matrix,: 0.9838886260986328 seconds to fit
100 x 1000 matrix,: 1.3436815738677979 seconds to fit
100 x 1000 matrix,: 1.475243330001831 seconds to fit
100 x 1000 matrix,: 0.3737659454345703 seconds to fit
100000 x 1000 matrix,: 0.11346721649169922 seconds to fit
100000 x 1000 matrix,: 0.11413407325744629 seconds to fit
100000 x 1000 matrix,: 0.1149301528930664 seconds to fit
100000 x 1000 matrix,: 0.11407017707824707 seconds to fit
100000 x 1000 matrix,: 0.11468839645385742 seconds to fit
When scikit-learn's ElasticNet cannot converge, it provides a warning: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations.
Perhaps we might consider something similar?
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Describe the bug cuml.ElasticNet (and Lasso) perform much slower than expected. shady_lea_X.zip shady_lea_y.zip
These data size are X.shape=(1345, 934); Y.shape=(1345,)
Note that the default alpha=1 is fast, but alpha=0.1, alpha=0.01 get progressively slower, then alpha=0.001 faster again...
Steps/Code to reproduce bug Here is a file "slowElasticNet.py that reproduces the behavior using the uploaded data files.
======================== Observed behavior (output of code above)
Expected behavior Subsecond speed instead of 5 seconds.
Environment details (please complete the following information):
Additional context Add any other context about the problem here.