rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.67k stars 881 forks source link

Butima clustering fails due to lack of RAM on large datasets. #6166

Closed sakoht closed 1 week ago

sakoht commented 1 year ago

When I run rdkit.ML.Cluster.Butima on a dataset of over 100k molecules it runs out of memory and crashes.

I'm making this issues b/c I already have a PR that allows it to run, and want to reference an issue. I know that it is an issue for others as well b/c the first web pages a person finds on Google that refer to molecule clustering show attempts to use this that fail: https://www.macinchem.org/reviews/clustering/clustering.php

PR to follow with narrowly focused changes to reduce memory usage...

github-actions[bot] commented 3 weeks ago

This issue was marked as stale because it has been open for 90 days with no activity.

github-actions[bot] commented 1 week ago

This issue was closed because it has been inactive for 14 days since being marked as stale.