Closed casonk closed 1 year ago
Thanks for identifying the issue @casonk ! We'll try to repro and debug and update you asap.
Which version of cuML/RAPIDS are you running?
Edited for readability - @csadorf
@casonk We have some trouble reproducing the issue on our side – would it be possible for you to provide a minimal reproducible example that demonstrates the issue and includes the code for how X
and y
originates? I suspect that the bug is conditional on the specific input type.
sure thing, I have included below :)
import cupy
import cuml
def fit_reg(x,y):
lr = cuml.LinearRegression(algorithm = "svd")
reg = lr.fit(x, y)
a = cupy.e**reg.intercept_
c = -reg.coef_[0]
print(a, c)
x = cupy.array([335791, 108442, 53268, 31293, 20018, 13590, 9968, 7502, 5648, 4476,
3616, 3047, 2455, 2056, 1713, 1484, 1176, 1123, 931, 826, 745, 625,
614, 520, 448, 404, 371, 340, 306, 289, 279, 217, 209, 185, 156, 172,
152, 145, 125, 134, 104, 82, 79, 90, 78, 62, 69, 63, 57, 80],
dtype=float)
y = cupy.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
dtype=float)
lg_x = cupy.log(x)
lg_y = cupy.log(y)
for i in range(10):
fit_reg(lg_x,lg_y)
output:
304.932683952256 0.41691873243014627
251.92992105110338 0.38950253528181034
79333.1132491717 1.2627422782657807
2.867318742395954e-10 -3.8064034197008185
88421345253.00504 3.388183766447345
1.6614572067360917e-07 -2.84409395852331
3255362205.5633545 2.893837080226306
2.996515514930703e-08 -3.1157708316115436
7307898230.821039 3.0266342943280207
4.979996310126534e-08 -3.04708495405204
And with the modification as mentioned in the issue:
import cupy
import cuml
def fit_reg_copy(x,y):
lr = cuml.LinearRegression(algorithm = "svd")
reg = lr.fit(x.copy(), y)
a = cupy.e**reg.intercept_
c = -reg.coef_[0]
print(a, c)
x = cupy.array([335791, 108442, 53268, 31293, 20018, 13590, 9968, 7502, 5648, 4476,
3616, 3047, 2455, 2056, 1713, 1484, 1176, 1123, 931, 826, 745, 625,
614, 520, 448, 404, 371, 340, 306, 289, 279, 217, 209, 185, 156, 172,
152, 145, 125, 134, 104, 82, 79, 90, 78, 62, 69, 63, 57, 80],
dtype=float)
y = cupy.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
dtype=float)
lg_x = cupy.log(x)
lg_y = cupy.log(y)
for i in range(10):
fit_reg_copy(lg_x,lg_y)
output:
304.932683952256 0.41691873243014627
304.932683952256 0.41691873243014627
304.932683952256 0.41691873243014627
304.932683952256 0.41691873243014627
304.932683952256 0.41691873243014627
304.932683952256 0.41691873243014627
304.932683952256 0.41691873243014627
304.932683952256 0.41691873243014627
304.932683952256 0.41691873243014627
304.932683952256 0.41691873243014627
@casonk Thanks a lot for the MRE. We were able to reproduce the issue.
https://github.com/rapidsai/cuml/blob/0c2a1035378bc8fd6def06f9896fa7361b92864b/python/cuml/linear_model/linear_regression.pyx#L302
issue was realized when fitting 2 models on the same X, Y inputs
request to implement Copy_X paramater set True by default as in sklearn
current workaround:
Note: issue may extend to other models