Closed yhoogstrate closed 9 years ago
This issue has been solved from version 2.* by using subsets instead of supersets:
[A,B] + [A,B,C] → Overlap: [A,B]* << that's the subset [A,B]* + [B,C] → No Overlap
Order 2: [A,B] + [B,C] → no overlap [no overlap]\ + [A,B,C] → no overlap**
Both orders will produce the same output.
If we have three datasets with one fusion in each dataset, of which for all fusions the left junction is identical and spanning the same gene but the right junction is spanning a different (sub)set:
Genes dataset 1: [Left],[A,B] Genes dataset 2: [Left],[A,B,C] Genes dataset 3: [Left],[B,C]
Then the outcome is dependent on the order of comparison:
Order 1: [A,B] + [A,B,C] → Overlap: [A,B,C] [A,B,C]\ + [B,C] → Overlap: [A,B,C]**
Order 2: [A,B] + [B,C] → no overlap [no overlap]\ + [A,B,C] → no overlap**
We expect this bug to be rare, but it may affect the outcome only by changing order of the samples. Because of the object oriented structure of the code - i.e. the concatenated datasets are used as novel datasets - it is barely impossible to solve this issue without loosing much (time) performance. It is not planned to solve this bug at the moment.