rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.19k stars 527 forks source link

[BUG] unable to pickle trained cuml dask KNeighborsClassifier #5600

Open dylanrstewart opened 1 year ago

dylanrstewart commented 1 year ago

Describe the bug After fitting a cuml dask KNeighborsClassifier, the get_combined_model() method returns None

Code to reproduce bug

import cudf
import cuml
import dask
import dask_cudf

from cuml.dask.datasets import make_blobs
from cuml.dask.neighbors import KNeighborsClassifier as KNN
from dask.distributed import Client
from dask_cuda import LocalCUDACluster

if __name__ == "__main__":
    cuml.internals.memory_utils.set_global_output_type("cudf")
    dask.config.set({"dataframe.backend": "cudf"})
    cluster = LocalCUDACluster(threads_per_worker=1)
    client = Client(cluster)

    n_workers = len(client.scheduler_info()["workers"].keys())
    X, y = make_blobs(n_samples=5000, n_features=30, centers=5, cluster_std=0.4, random_state=0, n_parts=n_workers * 5)
    ddf = dask_cudf.from_cudf(cudf.DataFrame(X.compute()), npartitions=2)
    ddfy = dask_cudf.from_cudf(cudf.DataFrame(y.compute()), npartitions=2)
    knn = KNN(n_neighbors=5)
    knn.fit(ddf, ddfy)
    print(knn.get_combined_model())

Expected behavior Model should be referenced

Environment details:

dantegd commented 1 year ago

Thanks for the issue @dylanrstewart! We have a PR in flight to fix this https://github.com/rapidsai/cuml/pull/5571, in process of being merged, we'll test your example to see if it's fixed, otherwise we'll look into any further potential issues.