Closed mcwitt closed 3 weeks ago
Also,
atom_mapping.py
is not currently in the diff, but I think there may be points in this documentation that should be updated:
Updated in https://github.com/proteneer/timemachine/pull/1415/commits/11610994e827997ab78cc778d596a3fc4b19987b and https://github.com/proteneer/timemachine/pull/1415/commits/65fc2aad961a3f03645779921ce65cf2a7eff014
this PR is :fire::fire::fire::fire::fire:
Modifies the search strategy in the maximum common substructure (MCS) search (used during atom mapping) to greatly reduce the number of nodes that must be searched in certain problem instances.
Background
The current approach is based on McGregor 1982, and proceeds by depth-first search over all possible mappings
A -> Optional[B]
.The tree is pruned according to an upper bound on the number of edges that are required to be preserved by the mapping (in the sense that if a1 maps to b1 and a2 maps to b2, then
adj(a1, a2) == adj(b1, b2)
) as follows:min_num_edges = min(edges in A, edges in B)
. During DFS, we prune branches whosenum_edges_upper_bound
is less thanmin_num_edges
.min_num_edges
by one and restart DFS.This strategy, while efficient in cases where the number of edges preserved by an optimal mapping is close
min(edges in A, edges in B)
, requires searching a large number of nodes when this is not the case.Changes
This PR changes the search strategy as follows:
min_num_edges
. By itself, this change is catastrophically inefficient, since we no longer have any mechanism for pruning the search tree.Validation
Public FEP Benchmarks
a = master b = this branch
For each instance, mapping generated by B verified to be identical to that generated by A.