Closed mlinegar closed 2 years ago
Hi, thank you for your report. hdbscan calculates a distance matrix of size n^2. For your data, this is in GB
137649^2 * 8 / 2^30 [1] 141.168
Maybe your R process is not allowed to allocate that much?
Update: I have updated the hdbscan code to work with long vectors. The version on GitHub might address your problem.
I have been running into a segfault error when running
hdbscan
. I initially ran into the error when using thedoc2vec
library which callshdbscan
. I only run into the error when running on my full set of data (137649 rows, ~300mb), but not for a subset. The error still happens even if I try increasing minPts, or increasing the size of the server I am using (I have tried up to 600GB RAM).Is there any way around this error? Please let me know if there's anything I can do to help debug!
Here is the output of
sessionInfo()
: