[BUG] SGD produces nans in predict method

divyegala commented 5 years ago

Describe the bug SGD produces nans when using its predict method on the Kaggle house prices dataset (I have preprocessed to remove nans from the training dataset, and did one-hot encoding on categorical columns)

Steps/Code to reproduce bug Read the data as:

import numpy as np

Xtrain = np.genfromtxt('/datasets/sgs_house_prices/Xtrain.csv', delimiter=',')
ytrain = np.genfromtxt('/datasets/sgs_house_prices/ytrain.csv', delimiter=',')
Xval = np.genfromtxt('/datasets/sgs_house_prices/Xval.csv', delimiter=',')
yval = np.genfromtxt('/datasets/sgs_house_prices/yval.csv', delimiter=',')

Train as:

from cuml.solvers import SGD as cuSGD

cu_sgd = cuSGD(alpha=50, eta0=0.005, penalty='l1', epochs=100)
cu_sgd.fit(Xtrain, ytrain)

Test as:

Y = cu_sgd.predict(Y)
print(Y)

Output:

0    nan
1    nan
2    nan
3    nan
4    nan
5    nan
6    nan
7    nan
8    nan
9    nan
[282 more rows]
dtype: float64

Expected behavior sklearn's linear_model.SGDClassifier produces proper results with the same parameters. The expectation is to obtain the predicted target column values.

Environment details (please complete the following information):

Environment location: [Docker]
Linux Distro/Architecture: [Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-142-generic x86_64)]
GPU Model/Driver: [T4 and driver 410.104]
CUDA: [10.0]
Method of cuDF & cuML install: [Docker]
- docker pull rapidsai/rapidsai-nightly:0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.7 docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \ -v ~/demo:/demo -v /datasets/sgd_house_prices:/datasets rapidsai/rapidsai-nightly:0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.7

oyilmaz-nvidia commented 5 years ago

@divyegala Is there a specific reason to set alpha=50? 50 is very unusual number and it's typically a number between 0 and 1. Because of the alpha, the solver is not able to converge to a solution. Do you observe the same thing with a reasonable alpha (between 0 and 1)?

oyilmaz-nvidia commented 5 years ago

@divyegala scikit-learn's results should be also very unstable. Maybe, we can add a warning when alpha is too high like 50.

divyegala commented 5 years ago

@divyegala Is there a specific reason to set alpha=50? 50 is very unusual number and it's typically a number between 0 and 1. Because of the alpha, the solver is not able to converge to a solution. Do you observe the same thing with a reasonable alpha (between 0 and 1)?

Right, I was doing that because I really wanted to observe the effect of a high penalty l1 regression and see how many dimensions could be reduced. Also, the results are unstable even when I keep an alpha between 0 to 1, I did try a lot of values.

divyegala commented 5 years ago

@divyegala scikit-learn's results should be also very unstable. Maybe, we can add a warning when alpha is too high like 50.

scikit-learn is giving stable values with that alpha.

oyilmaz-nvidia commented 5 years ago

Btw, maybe you didn't notice but our implementation is mini-batched stochastic gradient descent. It's slightly different than skl's implementation.

divyegala commented 5 years ago

Btw, maybe you didn't notice but our implementation is mini-batched stochastic gradient descent. It's slightly different than skl's implementation.

I did notice it and played around with batch size too. While at times, I got fewer nans and some numbers, those still didn't make sense. I was testing with the house prices dataset and was getting negative values in my predictions.

oyilmaz-nvidia commented 4 years ago

Hi, checking this bug. Is there anyway I can get the data and the code for preprocessing to replicate the error?

rapidsai / cuml

[BUG] SGD produces nans in predict method #667