Open ekg opened 5 years ago
I don't like the idea of using a filter based on the cigar string because it is not always present in all files. But this is not a fundamental problem.
To understand the idea of the filter, for this overlap:
A 100 50 100 + B 100 0 50 10 50 50 255 cg:Z:15I10X15I
fpa must split this overlaps and give in output the two "good" part of overlap or just filter out this overlap
I don't like the idea of using a filter based on the cigar string because it is not always present in all files. But this is not a fundamental problem.
I do understand you. I appreciate this is a new direction for fpa as you aren't working with these strings before. On my side, I can't really work without the cigar strings.
To understand the idea of the filter, for this overlap: fpa must split this overlaps and give in output the two "good" part of overlap or just filter out this overlap
That'd be the idea. No worries if this isn't something trivial for you to do or useful for your work. I can implement the modifier in another context.
At the moment my parser ignores the optional fields of the paf and its would require time to adapt it and create a cigar string parser.
This feature seems very interesting/important to me but requires a lot of code to be written and I unfortunately don't have time for write it yet.
If you want to have this behaviour quickly, you may have to develop it yourself.
I'd like to remove parts of alignments that have low identity. The idea would be to take a longer alignment and break it into multiple alignments, removing regions where the identity drops below some threshold over a window of a given length. This would have to work on top of alignments with cigar strings.
The goal is to provide a controllable limit to collapse between diverged regions of sequences in graphs that are built from PAF based alignments. Applying this filter should make the graph have more large bubbles and be more "open", but have less small bubbles.