Experimental command line interface UX

dantegd commented 1 week ago

PR adds a first version of a command line user experience that covers the following estimators:

Linear Regression, Ridge, Lasso and ElastiNet
Logistic Regression
PCA and tSVD
DBSCAN, KMeans and HDBSCAN
UMAP and TSNE
Nearest Neighbors

betatim commented 1 week ago

I ran the following small snippet to see things in action, but I'm now puzzled about whether or not cuml was used. Is there an easy way to tell (assume I'm a simple minded user who isn't going to dig into the cuml codebase)?

import cuml.experimental.accel
cuml.experimental.accel.install()

from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

X, y = make_blobs()
km = KMeans()

km.fit(X, y)
print(f"{km.cluster_centers_=}")
print(km.score(X, y))

This outputs the following:

Installing cuML Accelerator...
[I] [08:33:03.570100] Non Estimator Function Dispatching disabled...
[I] [08:33:03.605120] Non Estimator Function Dispatching disabled...
[I] [08:33:03.607562] Non Estimator Function Dispatching disabled...
km.cluster_centers_=array([[ 8.51813728,  0.89449653],
       [ 5.36304509, -9.09408513],
       [-1.06137904,  6.52824416],
       [ 7.0920223 , -1.11348216],
       [ 6.98095313, -8.23207799],
       [ 6.79229768, -9.76694763],
       [ 0.20774067,  7.58842924],
       [ 6.71965882,  1.64106257]])
-85.08620849985817

I was expecting to see either a log message saying "This was run on the GPU!" (or something similarly positive and simple) or as an alternative something like what I proposed in scikit-image where we issue a DispatchNotification (via the warning system) that lets people know code was run differently from how it would have been without the dispatching enabled.

The second thing I thought might tell me if it was dispatched was inspecting a fitted attribute, though I guess cuml array works hard to make that hard :-/

betatim commented 6 days ago

In general I think we can fix/change most things here after people start trying it.

These are things I'd fix before:

remove commented out code and print statements
fix docstring formatting, triple quotes, grammar, etc (IMHO not nicely done docstrings are like having a messy workshop, it doesn't mean the mechanic is less good but the first impression is less good)
deal with things like KMeans(8) so that we don't skip parameters by accident. Also why does it show up as args?
Add a log message or dispatch notification (via the warnings system) to let people know "Congratulations, your code is running on a GPU! Time to celebrate!" - given this is all about making people use GPUs I think making sure that it is 120% clear to users that they just got accelerated
clean up the existing log messages. Either by making them more detailed or removing them for now

betatim commented 1 day ago

For me its fine to merge. We can always keep working on things

dantegd commented 1 day ago

/merge

rapidsai / cuml

Experimental command line interface UX #6135