Open vikcost opened 1 month ago
@vikcost can you explain what you mean by DBSCAN failing to return any result? Does that mean there is a crash or something else going on?
@divyegala
After reviewing and waiting for longer, I see DBSCAN returning clustering results on datasets of 1_000_000 data points.
However, I expect to get quicker performance by setting rmm_pool_size="24GB"
, but computation time slightly increased from 303 sec to 312 sec.
cluster = LocalCUDACluster(protocol="ucx", rmm_pool_size="24GB")
It's unexpected, provided that RMM is designed for advanced memory management. Am I setting these hyperparameters in a wrong way?
However, when I run clustering on 5_000_000 data points, I don't see typical log outputs, as below:
[W] [22:35:28.663183] Batch size limited by the chosen integer type (4 bytes). 3998 -> 2147. Using the larger integer type might result in better performance
[W] [22:35:32.380082] Batch size limited by the chosen integer type (4 bytes). 3998 -> 2147. Using the larger integer type might result in better performance
...
Also, GPU-utilization is 0% and script doesn't show any signs of activity. How would one estimate a run time of a multi-GPU DBSCAN as a function of number of data points?
Below is a minimal version of a test script for multi-gpu DBSCAN. I have 6 RTX 4090 on my machine that I want to utilize.
I observe memory allocations and de-allocations on my GPUs. But DBSCAN fails to return any result.
Any idea where the issue might be coming from?
Environment details: