nanoporetech / duplex-tools

Splitting of sequence reads by internal adapter sequence search
Other
48 stars 6 forks source link

Feature request: split_on_adapter print each alignment within a read that results in a split (multiple splits) #14

Open callumparr opened 2 years ago

callumparr commented 2 years ago

Is it possible to not only save the print_alignment to file rather than just stdout with the --print_alignment option but do this for each alignment found within a read in the case there are multiple splits.

onordesjo commented 2 years ago

We could consider doing this, it'll be a bit hairy since it's using multiple processes

onordesjo commented 2 years ago

How you would use the output though? It can often be a pain to parse alignment that's dumped as ascii. Perhaps it would be better to have a table of matches? Would that be better/useful?

read_id start end
befd-... 20 30
befd-... 120 130
callumparr commented 2 years ago

It is useful to have an entire record of where and how close the alignments are for all hits as opposed to the first hit in the case of multiple splits.

At least the position, start, and end would be good to have for each alignment that meets the edit distance threshold for each read. Can then extract these sequences myself from the original input read to look at the alignment.

onordesjo commented 2 years ago

Makes sense, I've made a MR internally that should be looked through quite soon

callumparr commented 2 years ago

Makes sense, I've made a MR internally that should be looked through quite soon

Thanks!