rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.25k stars 534 forks source link

Add `to_sklearn` and `from_sklearn` APIs to serialize to CPU sklearn-estimators for supported models #6102

Open dantegd opened 1 month ago

betatim commented 2 days ago

Why have the methods do both the conversion cuml<>sklearn and the serialisation? Having a way to convert to and from scikit-learn seems like a useful thing by itself. Maybe because you have your own way of serialising the model, or because you need a particular type of model or who-knows-what.

So to serialise it you'd do something like pickle.dumps(cuml_est.to_sklearn()) (or dill, joblib, ...)

How hard would it be to have cuml.from_sklearn(estimator)? As in one top level function that takes a scikit-learn estimator and converts it to the cuml equivalent? It seems like it should be easy to figure out the estimator's class name: "just look at .__class__.__name__" but I wonder if there is a trap here?


Name bike shedding: if we don't save things to a file, how about as_sklearn? A bit like other functions that do type conversion like astype.

If we do save to a file, then save_sklearn and load_sklearn? Basically getting words like "save" and "load" in there to make it clear that this is about storing things (to a file).