Open Oleg-dM opened 3 years ago
Tagging @RAMitchell @venkywonka @vinaydes for input on the issue
Thank you for the bug issue @Oleg-dM !
This doesn't address the main problem, but the large discrepancy in cuml.metrics.regression.mean_squared_error
vs sklearn.metrics.mean_squared_error
is (i think) a bug in the way cuml
deals with arrays.
y_test.shape
is (6250, 1) while predictions.shape
is (6250,). cp.ravel(y_test)
instead of y_test
, both metrics become equivalent.Will find root cause for why this is the case and file a bug 👍🏾
Thank you for the bug issue @Oleg-dM ! This doesn't address the main problem, but the large discrepancy in
cuml.metrics.regression.mean_squared_error
vssklearn.metrics.mean_squared_error
is (i think) a bug in the waycuml
deals with arrays.* so, from the above script, `y_test.shape` is (6250, 1) while `predictions.shape` is (6250,). * By using `cp.ravel(y_test)` instead of `y_test`, both metrics become equivalent.
Will find root cause for why this is the case and file a bug 👍🏾
Thank you Venkat - how serious do you think the main issue is?
One thing I didn't mention is that version 21.06 on which the test was run was source build for GTX architecture (CC 6.1) - could that explain the good performance of the GTX 1070ti compared to all other models?
Thank you in advance for keeping us up to date!
make_regression
and train_test_split
!make_regression
and train_test_split
in your script. In doing so, the mse
values are pretty much close to each other as expected.
EDIT: they may not be exactly identical due to floating point arithmetic My concern is the highly different MSE across GPU models:
If you look at the detailed test results, RTX 4000 performs consistently worth than RTX 3060ti, MSE are always in the range 35 for the former and 23 for the latter.
That is the inconsistency mentioned in the issue title - reproducibility is not a concern afaic
I ran this test in particular to make it easier to understand the issue but in our much more complex system we have the same issues i.e. the different cards do not produce the same results ..
this is a real issue !! outputs vary from card to card and cannot be considered reliable as they vary !!
I understand your concern @Oleg-dM, but from what I discovered, the noticeably different mse
for different cards are due to different X_train
, X_test
, y_train
and y_test
being given to the models. If you make sure the inputs to the model are identical, then the mse
comes to be very close to each other. (The small variations are attributed to floating point arithmetic).
Doing the same experiment for random forest classifier using sklearn make_classification
gives EXACT results as no floating point arithmetic is done there.
Running your above script using sklearn's make_regression
and train_test_split
i get the following avg MSE's:
I understand your concern @Oleg-dM, but from what I discovered, the noticeably different
mse
for different cards are due to differentX_train
,X_test
,y_train
andy_test
being given to the models.
MSE is averaged out of 15 runs and even so is significantly different across cards - we are speaking here 50% deviation between rtx 3060ti and rtx 4000 results
Variations of make_regression or train_test_split output cannot explain that
I will test the issue using the exact same dataset for all cards to double check
EDIT: tested on a fixed dataset on different cards and got the same MSE
So the functions make_regression and train_test_split are dependent on the card architecture?
Sorry for the confusion
Note, for the issue with make_regression and train_test_split, a PR to CuPy was merged recently (https://github.com/cupy/cupy/pull/5838) that fixes things, so the non deterministic behavior there should be fixed soon as well.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Hello,
Been using cuML for 3/4 months now and noticed something strange recently RFRegressor performance are inconsistent from one GPU model to another
This seems to have worsen in the latest release as detailed below.
Test description
The test (adapted from cuML doc example) consists in running 15 times a regression using make_regression and a RFRegressor and averaging the mean_squared_error (from cuml and sklearn) over the 15 runs. Something to note is that the issue worsens as the dataset gets bigger: here 25k samples / 100 features while perfs are aligned on smaller datasets of e.g. 10k samples / 50 features
Test results (detailed tests results below)
Release 21.06 seems to perform better than 21.10 in this test
------------- Release 21.06 ------------
MSE is twice as low for the 1070ti vs 1050
GTX 1050:
GTX 1070ti:
------------- Release 21.10 ------------
MSE is comparable for 1070ti vs 1050 while significantly worse for RTX 4000
GTX 1050:
RTX 3060ti:
RTX 4000:
Test reproduction
System used is Ubuntu 20.04 & Cuda 11.2
The code used for the test is slightly adapted from a cuML documentation example (found here)
Detailed tests results
------------- Release 21.06 ------------ Server 1: Ubuntu 20.04, Cuda 11.2
GTX 1050
GTX 1070ti
------------- Release 21.10 ------------ Server 2: Ubuntu 20.04, Cuda 11.2
GTX 1050
RTX 3060ti
RTX 4000