scikit-learn-contrib / lightning

Large-scale linear classification, regression and ranking in Python
https://contrib.scikit-learn.org/lightning/
1.73k stars 214 forks source link

ENH: Lightning seems to be slow when `loss=log` #14

Closed MechCoder closed 10 years ago

MechCoder commented 10 years ago

@mblondel I'm not sure if this is meant to be, but I ran a quick few benchmarks.

# Load News20 dataset from scikit-learn. 
bunch = fetch_20newsgroups_vectorized(subset="all")
X = bunch.data
y = bunch.target

# To remove the effect of parallelization
y[y != 1] = -1
time_logistic = []
time_lightning = []
Cs = np.logspace(-4, 4, 10)

for C in Cs:
    print C
    t = time()
    clf = LogisticRegression(penalty='l1', tol=0.0001, fit_intercept=False, C=C)
    t = time()
    clf.fit(X, y)
    time_logistic.append(time() - t)
    print time_logistic
    cl = CDClassifier(loss='log', tol=0.0001, max_iter=100, max_steps=0, C=C, penalty='l1')
    t = time()
    cl.fit(X, y)
    time_lightning.append(time() - t)
    print time_lightning

I get times like these for a grid of 10 Cs from np.logspace(-4, 4, 10)

time_lightning

[0.20100116729736328, 0.6052899360656738,  0.7211019992828369,
 2.470484972000122, 4.043258190155029, 7.791965007781982,
 10.92172908782959, 13.969007968902588, 12.534989833831787,
  5.275091886520386]

time_logistic

[0.08612680435180664, 0.22542500495910645, 0.5105628967285156,
 0.5970029830932617, 0.642221212387085, 0.8863811492919922,
 1.241279125213623, 1.1004469394683838, 0.9302711486816406,
 0.8940119743347168]
mblondel commented 10 years ago

What is your point?

liblinear doesn't implement the same algorithm as lightning...

MechCoder commented 10 years ago

Liblinear implements the CD + glmnet right in the same paper? I just wanted to clarify if the new GLMnet is inherently much faster than CDN, or if you think there are places in the lightning code where we could speed up.

mblondel commented 10 years ago

Indeed liblinear uses CD + glmnet. Don't they compare with CD in the paper?

One bottleneck is the computation of logs and exponentials. Using some kind of approximation could result in a big speed up.

lightning uses dataset and loss function abstractions. I am not sure what is the overhead of virtual method calls.

mblondel commented 10 years ago

Something else to be careful about is the stopping criterion. If liblinear doesn't use the same as lightning, the meaning of tol will differ. The only objective way to compare is to plot the objective value over time.