rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.2k stars 525 forks source link

[BUG] There is a mismatch between the definition and the invocation of `dbscanFitImpl` #5370

Open georgeliu95 opened 1 year ago

georgeliu95 commented 1 year ago

Describe the bug In the definition of dbscanFitImpl, it accepts max_mbytes_per_batch for limitation of memory. However, in the invocation, it becomes max_bytes_per_batch, which scales up to 1,000,000x. This would lead to out of memory in the case that the limitation is beyond the available memory size.

Steps/Code to reproduce bug No.

Expected behavior It should be divided by 1e6 before being passed to dbscanFitImpl.

Environment details (please complete the following information):

Additional context No.

dantegd commented 1 year ago

Thanks for the issue and noticing the discrepancy @georgeliu95! From the Python API it coincides with the implementation: https://github.com/rapidsai/cuml/blob/ab0e03be112aebc830873f1c98f7739fd0afd660/python/cuml/cluster/dbscan.pyx#L161 so it's a matter of doing a small fix for the discrepancy.

georgeliu95 commented 1 year ago

That's right, it works well with Python API. And please also fix it with the cpp example. 😊