The aim of this PR is to refactor the code. Initially the code created sub datasets, because it was expected to export them time-wise and it was very handy for running unit tests. As this is no longer part of the plan, I am refactoring the code to do just the tasks it is supposed to do and which are described in the manuscript: create a large overview of all fusion genes. This refactoring reduces the memory again - this time a really big win for large number of samples.
For now it seems that the output is exactly the same, except for the order of the output.
Todo's:
[x] Update changelog
[x] Convert all tests to using (1) the old files [take care of sorting] but use the new function
[x] Update readme and explain the n-space complexity instead of the number of combinations
[x] Bump version to 3.*
[x] Add support for summary output file
[x] Add more logging to the new objects
[ ] When all tests are passing, and all points have been addressed, use the following notation in list-output: chr1:10000(-)->chr2:20000(-) to indicate strand and acceptor-donor direction
No legacy support as the output was in rare cases truncated.
The aim of this PR is to refactor the code. Initially the code created sub datasets, because it was expected to export them time-wise and it was very handy for running unit tests. As this is no longer part of the plan, I am refactoring the code to do just the tasks it is supposed to do and which are described in the manuscript: create a large overview of all fusion genes. This refactoring reduces the memory again - this time a really big win for large number of samples.
For now it seems that the output is exactly the same, except for the order of the output.
Todo's:
chr1:10000(-)->chr2:20000(-)
to indicate strand and acceptor-donor direction