xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.io
Apache License 2.0
1.06k stars 67 forks source link

BUG: `LogisticRegression.fit` has poor performance on speed #717

Open JiaYaobo opened 9 months ago

JiaYaobo commented 9 months ago

Describe the bug

LogisticRegression.fit never stops with a bit larger data.

To Reproduce

When max_iter=1 everything works fine

from xorbits._mars.learn.glm import LogisticRegression
import numpy as np

n_rows = 100
n_cols = 2
X = np.random.randn(n_rows, n_cols)
y = np.random.randint(0, 2, n_rows)

lr = LogisticRegression(max_iter=1)
lr.fit(X, y)

However, just increase max_iter to 100, the program seems never stop (at least after 1min, it's weird.)

lr = LogisticRegression(max_iter=100)
lr.fit(X, y)
  1. Your Python version: 3.10.2
  2. The version of Xorbits you use: HEAD, install on my local device.
  3. I'm working on my Macbook with m1 pro chip
JiaYaobo commented 9 months ago

After some inspection, https://github.com/mars-project/mars/issues/2505#issue-1021662005 perhaps explain it, gradient based loop solution for logistic regression is inefficient for xorbits.

luweizheng commented 8 months ago

Maybe we should use a different approach on the control flows, like for loop. For example, the method in this paper.