seqan / seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
https://www.seqan.de
Other
398 stars 81 forks source link

output_sequence1_id and output_sequence2_id are same #3205

Open Jappy0 opened 9 months ago

Jappy0 commented 9 months ago

Hi there,

Thanks so much for this excellent C++ library for bioinformatics. I had a question when I was using it; please have a look at the following,

Platform

Question

I used the following configurations to get pairwise alignment from a vector. I found the output pairwise sequence id-1 and id-2 are the same. Is this problem a bug?

#include <seqan3/alignment/pairwise/align_pairwise.hpp>
#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/utility/views/pairwise_combine.hpp>

int main()
{
    using namespace seqan3::literals;

    std::vector vec{"ACGTGACTGACT"_dna4, "ACGAAGACCGAT"_dna4, "ACGTGACTGACT"_dna4, "AGGTACGAGCGACACT"_dna4};

    auto config = seqan3::align_cfg::method_global{} | seqan3::align_cfg::edit_scheme | seqan3::align_cfg::min_score{-7}
                | seqan3::align_cfg::output_score{} | seqan3::align_cfg::output_sequence1_id{}
                | seqan3::align_cfg::output_sequence2_id{};

    auto alignment_results = seqan3::align_pairwise(seqan3::views::pairwise_combine(vec), config);
    auto filter_v = std::views::filter(
        [](auto && res)
        {
            return res.score() >= -6;
        });

    for (auto const & result : alignment_results | filter_v)
        seqan3::debug_stream << result << '\n';

    // {sequence1 id: 0, sequence2 id: 0, score: -4}
    // {sequence1 id: 1, sequence2 id: 1, score: 0}
    // {sequence1 id: 3, sequence2 id: 3, score: -4}
}

Thanks for your time.

Best regards,

Jappy

SGSSGene commented 9 months ago

Hi Jappy0

Thank you for your report. There is indeed a design flaw in the align_pairwise method. It is not possible for it to report sequence ids. Instead it is reporting the index of the alignment-pair. We are discussing a solution.

For you to continue working, you need to create a list which does the same mapping as seqan3::views::pairwise_combine(vec) for indices and then convert it to the real sequence ids.

Here a detailed adjustment of your final for loop:

    // Create a mapping, that maps from alignment-pair-ids → sequence-ids
    auto pair_indices = seqan3::views::pairwise_combine(std::views::iota(0ul, vec.size()));
    for (auto const & result : alignment_results | filter_v)  {
        seqan3::debug_stream << result << '\n';

        // The sequence1_id is reporting the index of the alignment-pair, not the ids of the pairs them self
        auto alignment_id = result.sequence1_id(); // result.sequence1_id() == result.sequence2_id()

        auto [sid1, sid2] = pair_indices[alignment_id]; // alignment-pair-id -> sequence-id
        seqan3::debug_stream << "sid1: " << sid1 << "; sid2: " << sid2 << "\n"; // the real sequence ids
    }

I will keep this issue open, until we have some kind of proper solution in seqan

eseiler commented 9 months ago

Here's a godbolt link showing both versions: https://godbolt.org/z/q6dMYajef

Jappy0 commented 9 months ago

Here a detailed adjustment of your final for loop:

Hi Simon,

I see, thanks so much for the quick solution.

Jappy0 commented 9 months ago

Here's a godbolt link showing both versions: https://godbolt.org/z/q6dMYajef

Hi Enrico, Thanks so much for the examples.