Complex subset problem with n > 2 datasets

If we have three datasets with one fusion in each dataset, of which for all fusions the left junction is identical and spanning the same gene but the right junction is spanning a different (sub)set:

Genes dataset 1: [Left],[A,B] Genes dataset 2: [Left],[A,B,C] Genes dataset 3: [Left],[B,C]

Then the outcome is dependent on the order of comparison:

Order 1: [A,B] + [A,B,C] → Overlap: [A,B,C] [A,B,C]\ + [B,C] → Overlap: [A,B,C]**

Order 2: [A,B] + [B,C] → no overlap [no overlap]\ + [A,B,C] → no overlap**

We expect this bug to be rare, but it may affect the outcome only by changing order of the samples. Because of the object oriented structure of the code - i.e. the concatenated datasets are used as novel datasets - it is barely impossible to solve this issue without loosing much (time) performance. It is not planned to solve this bug at the moment.

yhoogstrate / fuma

Complex subset problem with n > 2 datasets #1