rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.16k stars 526 forks source link

[BUG] Dask PCA #5883

Open Intron7 opened 4 months ago

Intron7 commented 4 months ago

Describe the bug The return value element of transform has no shape and cant be integrated into a existing data structure without calling PCA.compute_chuch_sizes() because its has no size.

Steps/Code to reproduce bug

from dask_cuda import LocalCUDACluster
from dask.distributed import Client, wait
import cupy as cp
from cuml.dask.decomposition import PCA
from cuml.dask.datasets import make_blobs

cluster = LocalCUDACluster(threads_per_worker=1)
client = Client(cluster)

nrows = 6
ncols = 3
n_parts = 2

X_cudf, _ = make_blobs(n_samples=nrows, n_features=ncols,
                       centers=1, n_parts=n_parts,
                       cluster_std=0.01, random_state=10,
                       dtype=cp.float32)

cumlModel = PCA(n_components = 1, whiten=False)
XT = cumlModel.fit_transform(X_cudf)
print(XT.shape)

Expected behavior A clear and concise description of what you expected to happen.

Environment details (please complete the following information):

dantegd commented 4 months ago

@Intron7 thanks for reporting this! I'll try to repro and find a fix as soon as we can.

Intron7 commented 4 months ago

@dantegd I have an easy workaround for this #5555 is way more important imo.