seqan / seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
https://www.seqan.de
Other
411 stars 82 forks source link

alignment slice positions #3203

Open notestaff opened 1 year ago

notestaff commented 1 year ago

Platform

Question

Is there a documented way to extract, from a pairwise alignment result, the alignment slice positions (equivalent to the coordinates[] list of BioPython alignment objects)?

From looking at the code, it seems that the trace result returned from aligned_sequence_builder() includes this info in first_sequence_slice_positions / second_sequence_slice_positions, but does not include them in the alignment result object?

Thanks for help!
@eseiler @rrahn

rrahn commented 1 year ago

Hi @notestaff, your observation is correct. At the moment we have the infrastructure to allow user defined alignment outputs, but have it as an open TODO to incorporate this into the public configuration API of the alignments. Before we do so, however, we wanted to collect more information about what is actually needed. So your request is very helpful in that sense. To achieve the same thing right now, you would need to write your own adapter wrapper around the alignment object which itself is just a std::tuple of two aligned_sequences. This simply means that the original source sequences are wrapped by a gap_decorator. In that case you can iterate over the two gap_decorators and check with *it == seqan3::gap{} whether the currently referenced symbol is a gap or not. By adding some bookkeeping to track the last non-gap source position you could provide a similar interface as the BioPython coordinates.

Please let me know, if you need more information regarding this. Best regards!