proteneer / timemachine

Differentiate all the things!
Other
138 stars 17 forks source link

Add atom mapping arguments to restrict the number of connected components in the core #1349

Closed mcwitt closed 1 month ago

mcwitt commented 1 month ago

Removes the connected_core option from get_cores and replaces with 2 more general options:

For now, keeps the current default behavior by setting max_connected_components=1 and min_connected_component_size=1 in DEFAULT_ATOM_MAPPING_KWARGS.

For equivalent settings, ~performance is slightly decreased (5-10% in my testing). This could probably be optimized more if necessary, but I was starting to hit diminishing returns (and think it might be preferable to prioritize clarity of the code over few-percent speedups here).~ After optimizations in https://github.com/proteneer/timemachine/pull/1349/commits/fba342f2c25b0847433e618f9c2ea1b1c902302e, it's about 10% faster than the current implementation with default settings.

Increasing either max_connected_components or min_connected_component_size from their defaults appears to drastically reduce search space and improve performance in some examples. The intuition seems to be that allowing many smaller connected components means that we find a larger overall core (and thus fewer outer loops over decreasing core size), but are less able to prune the search tree at each iteration; the optimal values are likely somewhere in between.