Intersection Finder - Githubissues

Given a FragmentedMolecule object we need to find the set of intersections (also called overlaps). Two fragments in a FragmentedMolecule object are said to intersect if there is at least one nuclei which appears in both fragments. The intersection of two fragments is the set of all nuclei appearing in both fragments. Intersections are not limited to two fragments, e.g. it is possible for three fragments to share a set of common nuclei. In general, for n fragments we will need to consider overlaps arising from two fragments at a time, three fragments at a time, four fragments at a time, all the way up to all n fragments at a time.

In addition to finding the intersections we will need to weight them. For this we use the inclusion-exclusion principle (IEP). If an intersection arises from an even number of fragments it gets a weight of -1, if it results from an odd number of fragments it gets a weight of 1.

In general finding intersections is an NP-hard problem and thus will be very expensive if not coded cleverly.
We can exploit the fact that if a pair of fragments have no common nuclei, then any triple of fragments containing that pair also has no common nuclei. Similarly if a triple of fragments has no common nuclei, then any quadruple containing that triple also has no common nuclei.

As some examples. Consider the following fragments of a 5 nuclei system (number indicate nucleus index and are 1 based):

1, 2, 3
1, 4
3, 4
4, 5

There are five non-empty pair-wise intersections:

(1, 2, 3) with (1, 4). Intersection (1) weight -1
(1, 2, 3) with (3, 4). Intersection (3) weight -1
(1, 4) with (3, 4). Intersection (4) weight -1
(1, 4) with (4, 5). Intersection (4) weight -1
(3, 4) with (4, 5). Intersection (4) weight -1

and one non-empty three-way intersection:

(1, 4) with (3, 4) with (4, 5). Intersections (4) weight 1.

The final set of intersections is:

(1) with a weight of: -1
(3) with a weight of: -1
(4) with a weight of: -1-1-1+1=-2

At the moment I'm not sure what the interface should look like so we can write a function which takes a FragmentedMolecule object and returns a std::map<std::vector<std::size_t>, float> object. The return is a key/value structure where the keys are the nuclei in the overlap and the values are their weights. For the above example the final answer would be (note indices are now 0-based to conform to C++ standards):

using overlap_type = std::vector<std::size_t>;
std::map<overlap_type, float> overlaps;
overlaps[overlap_type{0}] = -1.0;
overlaps[overlap_type{2}] = -1.0;
overlaps[overlap_type{3}] = -1.0;

rmrresearch / GhostFragment

Intersection Finder #48