Open user06039 opened 3 years ago
This code seems to be working:
import cuml
from sklearn.cluster import KMeans as skKMeans
from cuml.cluster import KMeans as cuKMeans
from sklearn.datasets import make_blobs
from numpy.testing import assert_equal
X, _ = make_blobs(n_samples=1000, n_features=10, centers=8)
cuModel = cuKMeans()
cuModel.fit(X)
skModel = skKMeans()
with cuml.using_output_type("numpy"):
skModel.labels_ = cuModel.labels_
skModel.cluster_centers_ = cuModel.cluster_centers_
skModel._n_threads = 1
assert_equal(cuModel.predict(X), skModel.predict(X))
Also see sklearn's Model persistence page
@viclafargue Thank you, this seems to be really interesting trick, Is there any disadvantage of doing this?
Also, why do we need to set skModel._n_threads = 1
?
I don't see any disadvantage apart from the fact that this method may not work with every estimators. Know that if you're only interested in storing your trained cuML estimator it is possible to persist it with pickling. It will then be redeployed to GPU allowing faster predictions/transformations.
Also, why do we need to set skModel._n_threads = 1 ?
This is something specific to Scikit-Learn's KMean code. It needs to be specified to avoid a crash during prediction. In my understanding, it is used to set the number of threads to be used in OpenMP.
@viclafargue Thanks for clarifying it. This saved hours of re-training using scikit-learn kmeans implementation. I think there should be a way to do this directly in cuml, since not everyone uses GPU's in their production environment for inference.
Is there a way I could turn this post into a feature request?
@John-8704 turning it into a feature request would be very welcomedd
@dantegd I have edited the post, I hope that would suffice. I guess someone should change the labels attached to this post.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
We will consider this a feature request for simplification of this process in a future release (and documenting better). Thank you for filing!
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
I am trying to use KMeans in CUML for fitting the data, but for inference/prediction I want to do it on CPU? Is it possible somehow? I really need a way to predict on CPU. Please help
EDIT:
I feel like it's useful feature for the community, since training and tuning is more resource taking process using GPU make sense but for inference I feel like a CPU machine should do a decent job in production.