When I run rdkit.ML.Cluster.Butima on a dataset of over 100k molecules it runs out of memory and crashes.
I'm making this issues b/c I already have a PR that allows it to run, and want to reference an issue. I know that it is an issue for others as well b/c the first web pages a person finds on Google that refer to molecule clustering show attempts to use this that fail:
https://www.macinchem.org/reviews/clustering/clustering.php
PR to follow with narrowly focused changes to reduce memory usage...
When I run
rdkit.ML.Cluster.Butima
on a dataset of over 100k molecules it runs out of memory and crashes.I'm making this issues b/c I already have a PR that allows it to run, and want to reference an issue. I know that it is an issue for others as well b/c the first web pages a person finds on Google that refer to molecule clustering show attempts to use this that fail: https://www.macinchem.org/reviews/clustering/clustering.php
PR to follow with narrowly focused changes to reduce memory usage...