seqan / seqan3

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.
https://www.seqan.de
Other
409 stars 82 forks source link

FASTA IO parses newlines into sequence when alphabet is char #3126

Closed smehringer closed 1 year ago

smehringer commented 1 year ago

Does this problem persist on the current master?

Is there an existing issue for this?

Current Behavior

This error was introduced by https://github.com/seqan/seqan3/pull/3104

when I read FASTA file with alphabt = char, newlines are in my string.

The following test fails

struct char_traits : public seqan3::sequence_file_input_default_traits_dna
{
    using sequence_alphabet = char;
    using sequence_legal_alphabet = char;
};
using sequence_file_type = seqan3::sequence_file_input<char_traits,
                                                       seqan3::fields<seqan3::field::id, seqan3::field::seq>,
                                                       seqan3::type_list<seqan3::format_fasta>>;

TEST_F(read, whitespace_in_seq_char_alphabet)
{
    std::string input{">ID1\n"
                      "ACGTTTT\n\nTTTTTTTTTTT\n"
                      "\n"
                      ">ID2\n"
                      "ACGTTTT\t\tTTTTTTTTTTT\t\nTTTTTTTTTTT\vTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT\rTTTTTTTTTTTTTTTTT\n"
                      ">ID3 lala\n"
                      "ACGT\fTTA\n"};
    std::stringstream istream{input};

    sequence_file_type fin{istream, seqan3::format_fasta{}};

    auto it = fin.begin();
    for (unsigned i = 0; i < 3; ++i, ++it)
    {
        EXPECT_EQ((*it).id(), ids[i]);
        EXPECT_RANGE_EQ((*it).sequence(), seqs[i] | seqan3::views::to_char);
    }
}

with (only showing the error for i = 0)

Expected equality of these values:
  (*it).sequence()
    Which is: 
ACGTTTT

TTTTTTTTTTT

  seqs[i] | seqan3::views::to_char
    Which is: ACGTTTTTTTTTTTTTTT

Expected Behavior

Test passes. No newlines in my sequence.

Steps To Reproduce

Copy into fasta test and run.

Environment

- Operating system: Linux
- SeqAn version: current master
- Compiler: GCC-10.4

Anything else?

No response