[BUG] Mismatch between cuml and sklearn LinearRegression for specific inputs

wphicks commented 3 years ago

Describe the bug For specific training and test inputs to LinearRegression, cuml and sklearn produce significantly different outputs

Steps/Code to reproduce bug

import cupy as cp
import numpy as np
from cuml import LinearRegression as cuLinearRegression
from sklearn.linear_model import LinearRegression as skLinearRegression

# One of several examples found with Hypothesis
X_train = np.array([[25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206], [1.0000000000222042, 37525.13455882354, 25000.750007327206, 25000.750007327206, 25000.750007327206], [25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206], [25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206]], dtype=np.float64)
y_train = np.array([1.0, 2.003848073721735, 2.003848073721735, 2.003848073721735], dtype=np.float64)
X_test = np.array([[25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206]], dtype=np.float64)
y_test = np.array([2.003848073721735], dtype=np.float64)

cuols = cuLinearRegression()
cuols.fit(X_train, y_train)
cu_predict = cuols.predict(X_test)

skols = skLinearRegression()
skols.fit(X_train, y_train)
sk_predict = skols.predict(X_test)

print(cu_predict)  # [1.66923205]
print(sk_predict)  # [1.5625]

Expected behavior While this is a somewhat unlikely and artificial example, I would still expect cuml and sklearn output to be a closer match. This was discovered accidentally while working toward a more sophisticated Hypothesis testing setup for #1739, and I suspect it is a red herring for that issue, but it is probably still worth investigating further.

Environment details (please complete the following information):

Environment location: Bare-metal
Linux Distro/Architecture: Ubuntu 20.04
GPU Model/Driver: Quadro RTX 8000
CUDA: 11.0

Method of cuDF & cuML install: From source, b6e6450b7205813e2d405808e9c8e7cca0a9d254

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
abseil-cpp                20200923.3           h9c3ff4c_0    conda-forge
aiobotocore               1.2.0              pyhd3eb1b0_0  
aiohttp                   3.7.3            py38h497a2fe_1    conda-forge
aioitertools              0.7.1              pyhd8ed1ab_0    conda-forge
alabaster                 0.7.12                     py_0    conda-forge
apipkg                    1.5                        py_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
arrow-cpp                 1.0.1           py38h3d0fbbc_27_cuda    conda-forge
arrow-cpp-proc            3.0.0                      cuda    conda-forge
asn1crypto                1.4.0              pyh9f0ad1d_0    conda-forge
asvdb                     0.4.1               gd6cd8f2_36    rapidsai
async-timeout             3.0.1                   py_1000    conda-forge
async_generator           1.10                       py_0    conda-forge
atk-1.0                   2.36.0               h3371d22_4    conda-forge
attrs                     20.3.0             pyhd3deb0d_0    conda-forge
autoconf                  2.69            pl5320h36c2ea0_10    conda-forge
automake                  1.16.2          pl5320ha770c72_3    conda-forge
aws-c-cal                 0.4.5                h54aeb68_4    conda-forge
aws-c-common              0.4.65               h7f98852_0    conda-forge
aws-c-event-stream        0.2.6                hbb68377_1    conda-forge
aws-c-io                  0.8.0                h650c6ac_0    conda-forge
aws-checksums             0.1.10               h650c6ac_2    conda-forge
aws-sam-translator        1.34.0             pyh44b312d_0    conda-forge
aws-sdk-cpp               1.8.130              h831451f_1    conda-forge
aws-xray-sdk              2.6.0                    py38_0  
babel                     2.9.0              pyhd3deb0d_0    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.1                      py_0    conda-forge
backports.tempfile        1.0                        py_0    conda-forge
backports.weakref         1.0.post1       py38h32f6830_1002    conda-forge
beautifulsoup4            4.9.3              pyhb0f4dca_0    conda-forge
benchmark                 1.5.1                he1b5a44_2    conda-forge
black                     19.10b0                  py38_0    conda-forge
blas                      1.1                    openblas    conda-forge
bleach                    3.3.0              pyh44b312d_0    conda-forge
blinker                   1.4                        py_1    conda-forge
bokeh                     2.2.3            py38h578d9bd_0    conda-forge
boost                     1.72.0           py38h1e42940_1    conda-forge
boost-cpp                 1.72.0               h9d3c048_4    conda-forge
boto                      2.49.0                     py_0    conda-forge
boto3                     1.16.63            pyhd8ed1ab_0    conda-forge
botocore                  1.19.63            pyhd8ed1ab_0    conda-forge
brotli                    1.0.9                h9c3ff4c_4    conda-forge
brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.1               h36c2ea0_0    conda-forge
ca-certificates           2021.1.19            h06a4308_0  
cachetools                4.2.1              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            h7979940_1007    conda-forge
certifi                   2020.12.5        py38h578d9bd_1    conda-forge
cffi                      1.14.4           py38ha65f79e_1    conda-forge
cfitsio                   3.470                hb418390_7    conda-forge
cfn-lint                  0.44.4           py38h578d9bd_0    conda-forge
chardet                   3.0.4           py38h924ce5b_1008    conda-forge
clang                     8.0.1                hc9558a2_2    conda-forge
clang-tools               8.0.1                hc9558a2_2    conda-forge
clangxx                   8.0.1                         2    conda-forge
click                     7.1.2              pyh9f0ad1d_0    conda-forge
click-plugins             1.1.1                      py_0    conda-forge
cligj                     0.7.1              pyhd8ed1ab_0    conda-forge
cloudpickle               1.6.0                      py_0    conda-forge
cmake                     3.18.5               h1f3970d_0    rapidsai-nightly
cmake-format              0.6.11             pyh9f0ad1d_0    conda-forge
cmake_setuptools          0.1.3                      py_0    rapidsai
cmarkgfm                  0.4.2            py38h25fe258_3    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
colorcet                  2.0.6              pyhd8ed1ab_0    conda-forge
commonmark                0.9.1                      py_0    conda-forge
conda                     4.8.3            py38h32f6830_2    conda-forge
conda-build               3.20.3           py38h32f6830_0    conda-forge
conda-package-handling    1.7.2            py38h8df0ef7_0    conda-forge
conda-verify              3.1.1           py38h578d9bd_1003    conda-forge
cookies                   2.2.1                      py_0    conda-forge
coverage                  5.4              py38h497a2fe_0    conda-forge
cryptography              3.3.1            py38h2b97feb_1    conda-forge
cudatoolkit               11.0.221             h6bb024c_0    nvidia
cudf                      0.18.0a210202   cuda_11.0_py38_g3ecde9d387_226    rapidsai-nightly
cudnn                     8.0.5.39             ha5ca753_1    conda-forge
cuml                      0.18.0a0+99.gfb1c810e4.dirty          pypi_0    pypi
cupy                      8.4.0            py38h8c8895c_1    conda-forge
curl                      7.71.1               he644dc0_8    conda-forge
cutensor                  1.2.2.5              h96e36e3_2    conda-forge
cycler                    0.10.0                     py_2    conda-forge
cyrus-sasl                2.1.27               h3274739_1    conda-forge
cython                    0.29.21          py38h709712a_2    conda-forge
cytoolz                   0.11.0           py38h497a2fe_3    conda-forge
dask                      2021.1.1+13.gca81b897          pypi_0    pypi
dask-cuda                 0.18.0a210202           py38_55    rapidsai-nightly
dask-cudf                 0.18.0a210202   py38_g3ecde9d387_226    rapidsai-nightly
dask-glm                  0.2.0                      py_1    conda-forge
dask-labextension         4.0.1              pyhd8ed1ab_0    conda-forge
dask-ml                   1.8.0              pyhd8ed1ab_0    conda-forge
datashader                0.11.1             pyh9f0ad1d_0    conda-forge
datashape                 0.5.4                      py_1    conda-forge
dbus                      1.13.18              hb2f20db_0  
decorator                 4.4.2                      py_0    conda-forge
defusedxml                0.6.0                      py_0    conda-forge
distributed               2021.1.1+11.g98570fbc          pypi_0    pypi
dlpack                    0.3                  he1b5a44_1    conda-forge
docker-py                 4.4.1            py38h578d9bd_1    conda-forge
docker-pycreds            0.4.0                      py_0    conda-forge
docutils                  0.16             py38h578d9bd_3    conda-forge
double-conversion         3.1.5                he1b5a44_2    conda-forge
doxygen                   1.8.20               had0d8f1_0    conda-forge
ecdsa                     0.16.1             pyhd8ed1ab_0    conda-forge
entrypoints               0.3             py38h32f6830_1002    conda-forge
execnet                   1.8.0              pyh44b312d_0    conda-forge
expat                     2.2.10               h9c3ff4c_0    conda-forge
fa2                       0.3.5            py38h1e0a361_0    conda-forge
faiss-proc                1.0.0                      cuda    conda-forge
fastavro                  1.3.0            py38h497a2fe_0    conda-forge
fastrlock                 0.5              py38h709712a_2    conda-forge
feather-format            0.4.1              pyh9f0ad1d_0    conda-forge
filelock                  3.0.12             pyh9f0ad1d_0    conda-forge
filterpy                  1.4.5                      py_1    conda-forge
fiona                     1.8.18           py38h37fbd03_0    conda-forge
flake8                    3.8.4                      py_0    conda-forge
flask                     1.1.2              pyh9f0ad1d_0    conda-forge
flatbuffers               1.10.0            hf484d3e_1002    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      2.001                hab24e00_0    conda-forge
font-ttf-source-code-pro  2.030                hab24e00_0    conda-forge
font-ttf-ubuntu           0.83                 hab24e00_0    conda-forge
fontconfig                2.13.1            hba837de_1004    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
freexl                    1.0.6                h27cfd23_0  
fribidi                   1.0.10               h516909a_0    conda-forge
fsspec                    0.8.5              pyhd8ed1ab_0    conda-forge
future                    0.18.2           py38h578d9bd_3    conda-forge
gcsfs                     0.7.1                      py_0    conda-forge
gdal                      3.1.4            py38h25844d8_2    conda-forge
gdk-pixbuf                2.42.2               h0c95a7a_2    conda-forge
geopandas                 0.8.1                      py_0    conda-forge
geos                      3.8.1                he1b5a44_0    conda-forge
geotiff                   1.6.0                h5d11630_3    conda-forge
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
giflib                    5.2.1                h516909a_2    conda-forge
git                       2.30.0          pl5320h014a29a_0    conda-forge
glib                      2.66.4               h709712a_2    conda-forge
glib-tools                2.66.4               h709712a_2    conda-forge
glob2                     0.7                        py_0    conda-forge
glog                      0.4.0                h49b9bf7_3    conda-forge
gmock                     1.10.0               h4bd325d_7    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
google-auth               1.24.0             pyhd3deb0d_0    conda-forge
google-auth-oauthlib      0.4.2              pyhd8ed1ab_0    conda-forge
graphite2                 1.3.14               h23475e2_0  
graphviz                  2.42.3               h6939c30_2    conda-forge
grpc-cpp                  1.35.0               h146f9af_0    conda-forge
gtest                     1.10.0               h4bd325d_7    conda-forge
gtk2                      2.24.33              hab0c2f8_0    conda-forge
gts                       0.7.6                h64030ff_2    conda-forge
harfbuzz                  2.7.4                h5cf4720_0    conda-forge
hdf4                      4.2.13            h10796ff_1004    conda-forge
hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
holoviews                 1.14.1             pyhd3deb0d_0    conda-forge
httpretty                 1.0.5              pyhd8ed1ab_0    conda-forge
hypothesis                6.1.1              pyhd8ed1ab_0    conda-forge
icu                       68.1                 h58526e2_0    conda-forge
idna                      2.8                   py38_1000    conda-forge
imagesize                 1.2.0                      py_0    conda-forge
importlib-metadata        3.4.0            py38h578d9bd_0    conda-forge
importlib_metadata        3.4.0                hd8ed1ab_0    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
ipykernel                 5.4.3            py38h81c977d_0    conda-forge
ipython                   7.15.0           py38h32f6830_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.3              pyhd3deb0d_0    conda-forge
isort                     5.0.9            py38h32f6830_0    conda-forge
itsdangerous              1.1.0                      py_0    conda-forge
jedi                      0.17.2           py38h578d9bd_1    conda-forge
jeepney                   0.6.0              pyhd8ed1ab_0    conda-forge
jinja2                    2.11.3             pyh44b312d_0    conda-forge
jmespath                  0.10.0             pyh9f0ad1d_0    conda-forge
joblib                    1.0.0              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h516909a_0    conda-forge
json-c                    0.13.1            hbfbb72e_1002    conda-forge
json5                     0.9.5              pyh9f0ad1d_0    conda-forge
jsondiff                  1.1.2                      py_0    conda-forge
jsonpatch                 1.28               pyhd3eb1b0_0  
jsonpickle                1.5.1              pyhd3eb1b0_0  
jsonpointer               2.0                        py_0    conda-forge
jsonschema                3.2.0            py38h32f6830_1    conda-forge
junit-xml                 1.9                pyh9f0ad1d_0    conda-forge
jupyter-server-proxy      1.5.3              pyhd8ed1ab_0    conda-forge
jupyter_client            6.1.11             pyhd8ed1ab_1    conda-forge
jupyter_core              4.7.1            py38h578d9bd_0    conda-forge
jupyter_sphinx            0.3.1            py38h578d9bd_1    conda-forge
jupyterlab                2.1.5                      py_0    conda-forge
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
jupyterlab_server         1.2.0                      py_0    conda-forge
jupyterlab_widgets        1.0.0              pyhd8ed1ab_1    conda-forge
kealib                    1.4.14               h0042707_0    conda-forge
keyring                   22.0.1           py38h578d9bd_0    conda-forge
kiwisolver                1.3.1            py38h1fd1430_1    conda-forge
krb5                      1.17.2               h926e7f8_0    conda-forge
lapack                    3.9.0                    netlib    conda-forge
lcms2                     2.11                 hcbb858e_1    conda-forge
ld_impl_linux-64          2.35.1               hea4e1c9_2    conda-forge
libarchive                3.5.1                h3f442fb_1    conda-forge
libblas                   3.9.0                7_openblas    conda-forge
libcblas                  3.9.0                7_openblas    conda-forge
libcudf                   0.18.0a210202   cuda11.0_g3ecde9d387_226    rapidsai-nightly
libcumlprims              0.18.0a201203   cuda11.0_gff080f3_0    rapidsai-nightly
libcurl                   7.71.1               hcdd3856_8    conda-forge
libcypher-parser          0.6.2                         1    rapidsai
libdap4                   3.20.6               h1d1bd15_1    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libfaiss                  1.6.3           h328c4c8_3_cuda    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgcrypt                 1.9.1                h27cfd23_0  
libgdal                   3.1.4                h02eeb80_2    conda-forge
libgfortran-ng            9.3.0               hff62375_18    conda-forge
libgfortran5              9.3.0               hff62375_18    conda-forge
libglib                   2.66.4               hf9edacf_2    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
libgpg-error              1.41                 h9c3ff4c_0    conda-forge
libgsasl                  1.8.0                         2    conda-forge
libhwloc                  2.3.0                h5e5b7d1_1    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libkml                    1.3.0             hd79254b_1012    conda-forge
liblapack                 3.9.0                7_openblas    conda-forge
liblief                   0.10.1               he1b5a44_2    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libllvm8                  8.0.1                hc9558a2_0    conda-forge
libnetcdf                 4.7.4           nompi_h56d31a8_107    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libntlm                   1.5                  h7b6447c_0  
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libpng                    1.6.37               hed695b0_2    conda-forge
libpq                     12.3                 hfd2b0eb_3    conda-forge
libprotobuf               3.14.0               h780b84a_0    conda-forge
librdkafka                1.5.3                hc49e61c_1    conda-forge
librmm                    0.18.0a210202   cuda11.0_g89c560e_31    rapidsai-nightly
libsodium                 1.0.18               h516909a_1    conda-forge
libspatialindex           1.9.3                he1b5a44_3    conda-forge
libspatialite             5.0.0                heaf302f_0    conda-forge
libssh2                   1.9.0                hab1572f_5    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
libthrift                 0.13.0               hbe8ec66_6    conda-forge
libtiff                   4.2.0                hdc55705_0    conda-forge
libtool                   2.4.6             h58526e2_1007    conda-forge
libutf8proc               2.6.1                h7f98852_0    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libuv                     1.40.0               hd18ef5c_0    conda-forge
libwebp                   1.2.0                h3452ae3_0    conda-forge
libwebp-base              1.2.0                h7f98852_0    conda-forge
libxcb                    1.14                 h7b6447c_0  
libxml2                   2.9.10               h72842e0_3    conda-forge
lightgbm                  3.1.1            py38h709712a_0    conda-forge
llvmlite                  0.35.0           py38h4630a5e_1    conda-forge
locket                    0.2.1            py38h06a4308_1  
lz4-c                     1.9.3                h9c3ff4c_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
m4                        1.4.18            h516909a_1001    conda-forge
make                      4.3                  hd18ef5c_1    conda-forge
markdown                  3.3.3              pyh9f0ad1d_0    conda-forge
markupsafe                1.1.1            py38h497a2fe_3    conda-forge
matplotlib-base           3.3.4            py38h0efea84_0    conda-forge
mccabe                    0.6.1                      py_1    conda-forge
mimesis                   4.0.0              pyh9f0ad1d_0    conda-forge
mistune                   0.8.4           py38h497a2fe_1003    conda-forge
mock                      4.0.2            py38h32f6830_1    conda-forge
more-itertools            8.6.0              pyhd8ed1ab_0    conda-forge
moto                      1.3.16             pyhd8ed1ab_1    conda-forge
msgpack-python            1.0.2            py38h1fd1430_1    conda-forge
multidict                 5.1.0            py38h497a2fe_1    conda-forge
multipledispatch          0.6.0                      py_0    conda-forge
munch                     2.5.0                      py_0    conda-forge
mypy                      0.782                      py_0    conda-forge
mypy_extensions           0.4.3            py38h578d9bd_3    conda-forge
nbclient                  0.5.1                      py_0    conda-forge
nbconvert                 6.0.7            py38h578d9bd_3    conda-forge
nbformat                  5.1.2              pyhd8ed1ab_1    conda-forge
nbsphinx                  0.8.1              pyh44b312d_0    conda-forge
nccl                      2.8.3.1              h96e36e3_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
nest-asyncio              1.4.3              pyhd8ed1ab_0    conda-forge
networkx                  2.5                        py_0    conda-forge
nltk                      3.5                        py_0  
nodejs                    15.3.0               h92b4a50_1    conda-forge
notebook                  6.2.0            py38h578d9bd_0    conda-forge
numba                     0.52.0           py38h51da96c_0    conda-forge
numpy                     1.20.0           py38h18fd61f_0    conda-forge
numpydoc                  1.1.0                      py_1    conda-forge
nvtx                      0.2.1            py38h497a2fe_2    conda-forge
oauthlib                  3.1.0                      py_0  
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openblas                  0.3.12          pthreads_h04b7a96_1    conda-forge
openjpeg                  2.4.0                hf7af979_0    conda-forge
openssl                   1.1.1i               h7f98852_0    conda-forge
orc                       1.6.7                h7950760_0    conda-forge
packaging                 20.9               pyhd3eb1b0_0  
pandas                    1.1.5            py38h51da96c_0    conda-forge
pandoc                    1.19.2.1             hea2e7c5_1  
pandocfilters             1.4.3            py38h06a4308_1  
panel                     0.10.3             pyhd8ed1ab_0    conda-forge
pango                     1.42.4               h80147aa_5    conda-forge
param                     1.10.1             pyhd3deb0d_0    conda-forge
parquet-cpp               1.5.1                         1    conda-forge
parso                     0.7.1              pyh9f0ad1d_0    conda-forge
partd                     1.1.0                      py_0    conda-forge
patchelf                  0.12                 h2531618_1  
pathspec                  0.8.1              pyhd3deb0d_0    conda-forge
patsy                     0.5.1                      py_0    conda-forge
pcre                      8.44                 he1b5a44_0    conda-forge
perl                      5.32.0               h36c2ea0_0    conda-forge
pexpect                   4.8.0            py38h32f6830_1    conda-forge
pickleshare               0.7.5           py38h32f6830_1002    conda-forge
pillow                    8.1.0            py38h357d4e7_1    conda-forge
pip                       21.0.1             pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
pkg-config                0.29.2            h516909a_1008    conda-forge
pkginfo                   1.7.0              pyhd8ed1ab_0    conda-forge
pluggy                    0.13.1           py38h578d9bd_4    conda-forge
poppler                   0.89.0               h2de54a5_5    conda-forge
poppler-data              0.4.10                        0    conda-forge
postgresql                12.3                 h6303168_3    conda-forge
proj                      7.1.1                h966b41f_3    conda-forge
prometheus_client         0.9.0              pyhd3deb0d_0    conda-forge
prompt-toolkit            3.0.14             pyha770c72_0    conda-forge
protobuf                  3.14.0           py38h709712a_1    conda-forge
psutil                    5.8.0            py38h497a2fe_1    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
py                        1.10.0             pyhd3deb0d_0    conda-forge
py-cpuinfo                7.0.0              pyh9f0ad1d_0    conda-forge
py-lief                   0.10.1           py38h348cfbe_2    conda-forge
pyarrow                   1.0.1           py38hf66eee4_27_cuda    conda-forge
pyasn1                    0.4.8                      py_0    conda-forge
pyasn1-modules            0.2.8                      py_0  
pycodestyle               2.6.0              pyh9f0ad1d_0    conda-forge
pycosat                   0.6.3           py38h497a2fe_1006    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pycryptodome              3.9.9            py38h80e8405_1    conda-forge
pyct                      0.4.8                    py38_0  
pydeck                    0.5.0              pyh9f0ad1d_0    conda-forge
pyee                      7.0.4              pyh9f0ad1d_0    conda-forge
pyflakes                  2.2.0              pyh9f0ad1d_0    conda-forge
pygal                     2.4.0                      py_0    conda-forge
pygments                  2.7.4              pyhd8ed1ab_0    conda-forge
pyjwt                     2.0.1              pyhd8ed1ab_0    conda-forge
pynndescent               0.5.1              pyhd3deb0d_0    conda-forge
pynvml                    8.0.4                      py_1    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyppeteer                 0.2.2                      py_1    conda-forge
pyproj                    2.6.1.post1      py38h56787f0_3    conda-forge
pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
pysocks                   1.7.1            py38h578d9bd_3    conda-forge
pytest                    6.2.2            py38h578d9bd_0    conda-forge
pytest-asyncio            0.12.0           py38h32f6830_2    conda-forge
pytest-benchmark          3.2.3              pyh9f0ad1d_0    conda-forge
pytest-cov                2.11.1             pyh44b312d_0    conda-forge
pytest-forked             1.3.0                      py_0  
pytest-repeat             0.8.0                      py_0    conda-forge
pytest-timeout            1.4.2              pyh9f0ad1d_0    conda-forge
pytest-xdist              2.2.0              pyhd8ed1ab_0    conda-forge
python                    3.8.6           hffdb5ce_5_cpython    conda-forge
python-confluent-kafka    1.5.0            py38h1e0a361_0    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python-jose               3.2.0                      py_0  
python-libarchive-c       2.9              py38h924ce5b_2    conda-forge
python-louvain            0.15               pyhd3deb0d_0    conda-forge
python_abi                3.8                      1_cp38    conda-forge
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
pyviz_comms               2.0.1              pyhd3deb0d_0    conda-forge
pyyaml                    5.4.1            py38h497a2fe_0    conda-forge
pyzmq                     20.0.0           py38h3d7ac18_1    conda-forge
rapidjson                 1.1.0             hf484d3e_1002    conda-forge
rapids-build-env          0.18.0a210202   cuda11.0_py38_g91f53a4_204    rapidsai-nightly
rapids-doc-env            0.18.0a210202   py38_g91f53a4_204    rapidsai-nightly
rapids-notebook-env       0.18.0a210202   cuda11.0_py38_g91f53a4_204    rapidsai-nightly
rapids-pytest-benchmark   0.0.13                     py_0    rapidsai
re2                       2020.11.01           h58526e2_0    conda-forge
readline                  8.1                  h27cfd23_0  
readme_renderer           27.0               pyh9f0ad1d_0    conda-forge
recommonmark              0.7.1              pyhd8ed1ab_0    conda-forge
regex                     2020.11.13       py38h497a2fe_1    conda-forge
requests                  2.25.1             pyhd3deb0d_0    conda-forge
requests-oauthlib         1.3.0              pyh9f0ad1d_0    conda-forge
requests-toolbelt         0.9.1                      py_0    conda-forge
responses                 0.12.1             pyhd3deb0d_0    conda-forge
rfc3986                   1.4.0              pyh9f0ad1d_0    conda-forge
rhash                     1.4.1                h7f98852_0    conda-forge
ripgrep                   12.1.1               h516909a_1    conda-forge
rmm                       0.18.0a210202   cuda_11.0_py38_g89c560e_31    rapidsai-nightly
rsa                       4.7                pyhd3deb0d_0    conda-forge
rtree                     0.9.7            py38h02d302b_1    conda-forge
ruamel_yaml               0.15.87          py38h7b6447c_1  
s2n                       0.10.24              h9b69904_0    conda-forge
s3fs                      0.5.2              pyhd8ed1ab_0    conda-forge
s3transfer                0.3.4              pyhd8ed1ab_0    conda-forge
scikit-learn              0.23.1           py38h3a94b23_0    conda-forge
scipy                     1.5.3            py38hb2138dd_0    conda-forge
seaborn                   0.11.1               ha770c72_0    conda-forge
seaborn-base              0.11.1             pyhd8ed1ab_1    conda-forge
secretstorage             3.3.0            py38h578d9bd_0    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                52.0.0           py38h06a4308_0  
shapely                   1.7.1            py38hc7361b7_1    conda-forge
shellcheck                0.7.1                         0    conda-forge
simpervisor               0.4                pyhd8ed1ab_0    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
snappy                    1.1.8                he1b5a44_3    conda-forge
snowballstemmer           2.1.0              pyhd8ed1ab_0    conda-forge
sortedcontainers          2.3.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.1                pyhd3eb1b0_0  
spdlog                    1.7.0                hc9558a2_2    conda-forge
sphinx                    3.4.3              pyhd8ed1ab_0    conda-forge
sphinx-copybutton         0.3.1              pyhd8ed1ab_0    conda-forge
sphinx-markdown-tables    0.0.15             pyhd3deb0d_0    conda-forge
sphinx_rtd_theme          0.5.1              pyhd3deb0d_0    conda-forge
sphinxcontrib-applehelp   1.0.2                      py_0    conda-forge
sphinxcontrib-devhelp     1.0.2                      py_0    conda-forge
sphinxcontrib-htmlhelp    1.0.3                      py_0    conda-forge
sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
sphinxcontrib-qthelp      1.0.3                      py_0    conda-forge
sphinxcontrib-serializinghtml 1.1.4                      py_0    conda-forge
sphinxcontrib-websupport  1.2.4              pyh9f0ad1d_0    conda-forge
sqlite                    3.34.0               h74cdb3f_0    conda-forge
sshpubkeys                3.1.0                      py_0    conda-forge
statsmodels               0.12.2           py38h5c078b8_0    conda-forge
streamz                   0.6.2              pyh44b312d_0    conda-forge
tbb                       2020.3               hfd86e86_0  
tblib                     1.7.0                      py_0  
terminado                 0.9.2            py38h578d9bd_0    conda-forge
testpath                  0.4.4                      py_0    conda-forge
threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
tiledb                    2.1.6                h91fcb0e_1    conda-forge
tk                        8.6.10               hed695b0_1    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
toolz                     0.11.1                     py_0    conda-forge
tornado                   6.1              py38h497a2fe_1    conda-forge
tqdm                      4.56.0             pyhd8ed1ab_0    conda-forge
traitlets                 5.0.5                      py_0    conda-forge
treelite                  1.0.0            py38hd08a91b_0    conda-forge
treelite-runtime          1.0.0                    pypi_0    pypi
twine                     3.3.0            py38h578d9bd_0    conda-forge
typed-ast                 1.4.2            py38h497a2fe_0    conda-forge
typing-extensions         3.7.4.3                       0    conda-forge
typing_extensions         3.7.4.3                    py_0    conda-forge
tzcode                    2021a                h7f98852_0    conda-forge
ucx                       1.9.0+gcd9efd3       cuda11.0_0    rapidsai-nightly
ucx-proc                  1.0.0                       gpu    rapidsai
ucx-py                    0.18.0a210202   py38_gcd9efd3_13    rapidsai-nightly
umap-learn                0.5.0            py38h578d9bd_0    conda-forge
urllib3                   1.26.3             pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
websocket-client          0.57.0           py38h578d9bd_4    conda-forge
websockets                8.1              py38h497a2fe_3    conda-forge
werkzeug                  1.0.1              pyh9f0ad1d_0    conda-forge
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
widgetsnbextension        3.5.1            py38h578d9bd_4    conda-forge
wrapt                     1.12.1           py38h497a2fe_3    conda-forge
xarray                    0.16.2             pyhd8ed1ab_0    conda-forge
xerces-c                  3.2.3                h9d8b166_2    conda-forge
xmltodict                 0.12.0                     py_0    conda-forge
xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
xorg-libice               1.0.10               h516909a_0    conda-forge
xorg-libsm                1.2.3             h84519dc_1000    conda-forge
xorg-libx11               1.6.12               h516909a_0    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxpm               3.5.13               h516909a_0    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-libxt                1.1.5             h516909a_1003    conda-forge
xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
xorg-xproto               7.0.31            h14c3975_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
yarl                      1.6.3            py38h497a2fe_1    conda-forge
zeromq                    4.3.3                h58526e2_3    conda-forge
zict                      2.0.0                      py_0    conda-forge

dantegd commented 3 years ago

@wphicks I just repro'd and saw the difference, but it seems there is a typo in the comment of the last line:

>>> print(sk_predict)  # [1.5625]
[1.625]

seems like the comment accidentally added a 5 in there, there is still a significant difference but just wanted to confirm that I'm seeing the expected result from scikit

wphicks commented 3 years ago

Nope, I'm afraid that 1.5625 is genuinely what I get on my machine. Interesting that there's a discrepancy between our setups, though...

dantegd commented 3 years ago

particularly how they look almost the same except for a digit! I'm also using scikit 0.23.1, very interesting

viclafargue commented 3 years ago

I could reproduce the issue. On DGX, the predictions are still different but look a bit more similar as observed by Dante. However there are large differences on coefficients and intercept. I'll look into this.

cu coef_: [-1.91074691e-05 -1.14229906e-05  0.00000000e+00  0.00000000e+00 0.00000000e+00]
sk coef_: [6.63600728e+09 1.32460419e+10 0.00000000e+00 0.00000000e+00 0.00000000e+00]
cu intercept_: 2.432516440558203
sk intercept_: -497066142073824.6

viclafargue commented 3 years ago

The difference seems to be explained by the different methods used in cuML and sklearn. cuML uses ordinary least squares (with eigen or SVD decomposition) whereas sklearn uses non-negative least squares from Scipy. This seems like this example is a corner case. It produces poor results with eigendecomposition and literally crashes with SVD decomposition as cuSolver seems to be unable to find a solution for it. Datasets generated with make_regression seems to produce somewhat similar coefficients, biases and predictions. I think this isn't really a big concern as the problem seems to come from a difference in the methods that are used and seems to be only observable with corner cases.

wphicks commented 3 years ago

Awesome! Thanks for digging into it, @viclafargue. For my part, I'd like to understand why we're using OLS where sklearn uses NNLS and whether that difference is justified, but that's probably just something I need to work through, not necessarily something that indicates a genuine problem.

In terms of making sure that this would not come up in more realistic scenarios, I'm working on hooking Hypothesis into our dataset generation methods anyway, so I'll add make_regression into that, and then we can test a range of gnarlier but still somewhat realistic inputs. Even if that comes up empty, I'd say it's still probably worth leaving this issue open and trying to fix it at some point, since surprising inputs show up in real usage, but I 100% agree that this should not be a major priority at the moment.

wphicks commented 3 years ago

@beckernick is having some trouble posting right now, but he was kind enough to point out a couple of interesting things on this. First, if sklearn were using NNLS, wouldn't it force all coefficients to be >0? Second, if we force sklearn to use NNLS by passing positive=True, we get a very interesting result:

import cupy as cp
import numpy as np
from cuml import LinearRegression as cuLinearRegression
from sklearn.linear_model import LinearRegression as skLinearRegression
# One of several examples found with Hypothesis
X_train = np.array([[25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206], [1.0000000000222042, 37525.13455882354, 25000.750007327206, 25000.750007327206, 25000.750007327206], [25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206], [25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206]], dtype=np.float64)
y_train = np.array([1.0, 2.003848073721735, 2.003848073721735, 2.003848073721735], dtype=np.float64)
X_test = np.array([[25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206, 25000.750007327206]], dtype=np.float64)
y_test = np.array([2.003848073721735], dtype=np.float64)
cuols = cuLinearRegression()
cuols.fit(X_train, y_train)
cu_predict = cuols.predict(X_test)
skols = skLinearRegression(positive=True)
skols.fit(X_train, y_train)
sk_predict = skols.predict(X_test)
print(cu_predict)  # [1.66923205]
print(sk_predict)  # [1.5625]
[1.66923205]
[1.66923205]

I'd like to make sure we're confident that we understand what's going on under the hood there, even if this is a weird degenerate case. @viclafargue I'm happy to pick up where you left off, or you can dive back into it if you'd like to keep picking at it.

viclafargue commented 3 years ago

My bad, you're right it uses scipy.linalg.lsts when the positive parameter is unset. NNLS seems to produce similar predictions, that's interesting. Would also be interesting to check the coefficients and biases.

The least square function of Scipy can call multiple LAPACK methods (gelsd, gelsy and gelss). It uses gelsd as default.

viclafargue commented 3 years ago

The three LAPACK methods seems to produce somewhat similar results, distinct from the one of cuML : lapack_methods_vs_cuml.py.txt

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

wphicks commented 3 years ago

Still relevant and still worth investigating, I think. Not a super high-priority issue at the moment, though, so no recent updates.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapidsai / cuml

[BUG] Mismatch between cuml and sklearn LinearRegression for specific inputs #3465