seqan / seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
https://www.seqan.de
Other
396 stars 81 forks source link

Convert vertor<dna5> to std:string #3218

Open nhhaidee opened 7 months ago

nhhaidee commented 7 months ago

Platform

Question

With the following code, is there any way to Convert vertor to std:string as I want to handle C++ standard string?

Thanks, Hai

auto input = R"(>TEST1
ACGT
>Test2
AGGCTGA
>Test3
GGAGTATAATATATATATATATAT)";

int main(int argc, const char *argv[]) {

    using sequence_file_input_type =
            seqan3::sequence_file_input<seqan3::sequence_file_input_default_traits_dna,
                    seqan3::fields<seqan3::field::seq, seqan3::field::id>,
                    seqan3::type_list<seqan3::format_fasta>>;
    sequence_file_input_type fin{std::istringstream{input}, seqan3::format_fasta{}};
    // Retrieve the sequences and ids.
    for (auto &[seq, id]: fin) {
        seqan3::debug_stream << "ID:  " << id << '\n';
        seqan3::debug_stream << "SEQ: " << seq << '\n';
        // a quality field also exists, but is not printed, because we know it's empty for FASTA files.
    }

    return 0;
}
smehringer commented 7 months ago

Hi @nhhaidee,

thanks for reaching out!

This is indeed a common use case that is not well handled by our library. The solution is a bit unintuitive:

You can adapt the seqan3::sequence_file_input_default_traits_dna

struct my_traits : seqan3::sequence_file_input_default_traits_dna
{
    using sequence_alphabet = char; // instead of dna5

    template <typename alph>
    using sequence_container = std::basic_string<alph>; // must be defined as a template!
};

that will automatically read the sequences as a std::string (std::string = std::basic_string<char>)

Full Solution:

#include <iostream>

#include <seqan3/io/sequence_file/all.hpp>
#include <seqan3/core/debug_stream.hpp>

auto input = R"(>TEST1
ACGT
>Test2
AGGCTGA
>Test3
GGAGTATAATATATATATATATAT)";

struct my_traits : seqan3::sequence_file_input_default_traits_dna
{
    using sequence_alphabet = char; // instead of dna5

    template <typename alph>
    using sequence_container = std::basic_string<alph>; // must be defined as a template!
};

int main(int argc, const char *argv[]) {

    using sequence_file_input_type =
            seqan3::sequence_file_input<my_traits,
                    seqan3::fields<seqan3::field::seq, seqan3::field::id>,
                    seqan3::type_list<seqan3::format_fasta>>;

    sequence_file_input_type fin{std::istringstream{input}, seqan3::format_fasta{}};
    // Retrieve the sequences and ids.
    for (auto &[seq, id]: fin) {
        std::cout << "ID:  " << id << '\n';
        std::cout << "SEQ: " << seq << '\n';
        // a quality field also exists, but is not printed, because we know it's empty for FASTA files.
    }

    return 0;
}

working on Compiler Explorer: https://godbolt.org/z/PrrooYzTK

As you can see, the sequence can now also be printed with std::cout since it is a std::string