mengyao / Complete-Striped-Smith-Waterman-Library

299 stars 113 forks source link

Sequence information doesn't clear #2

Closed mortonjt closed 11 years ago

mortonjt commented 11 years ago

I made a really simple program mostly based off of example.cpp

void PrintAlignment(const StripedSmithWaterman::Alignment& alignment);

int main(){ std::string line,query,ref; std::stringstream ss; getline(std::cin,line); while(!std::cin.eof()){ ss<<line; ss>>query>>ref; { StripedSmithWaterman::Aligner aligner; StripedSmithWaterman::Filter filter; StripedSmithWaterman::Alignment alignment; aligner.Align(query.c_str(), ref.c_str(), ref.size(), filter, &alignment); PrintAlignment(alignment); alignment.Clear(); aligner.ReBuild(); } getline(std::cin,line); } }

void PrintAlignment(const StripedSmithWaterman::Alignment& alignment){ std::cout << alignment.query_begin << "\t" << alignment.query_end << "\t" << alignment.ref_begin << "\t" << alignment.ref_end << "\t" << alignment.cigar_string << std::endl; }

And I gave it a couple of test cases

TTAAGTAAAA TTAAGGTGATGTGTGTGTAAAA TTTTTTTT TTTTAAAATTTT

After I compile and run this program, I get the following output

When maskLen < 15, the function ssw_align doesn't return 2nd best alignment information. 4 9 16 21 4S6M When maskLen < 15, the function ssw_align doesn't return 2nd best alignment information. 4 9 16 21 4S6M

My first thought is that there is some memory bug where the alignment information from the first alignment isn't being cleared. I hope that this issue will be investigated. I think that this library has potential

mengyao commented 11 years ago

Hi,

Thanks a lot for your interest in the SSW library.

If you only need the best alignment information, please ignore the message: "When maskLen < 15, the function ssw_align doesn't return 2nd best alignment information." When your given sequences are very short, SSW will give this message, which means the suboptimal alignment information cannot be given.

In this case, the best alignment results are still correct. For example, your 1st query is "TTAAGTAAAA", reference is "TTAAGGTGATGTGTGTGTAAAA". The alignment results mean:

01 23 45 67 89 01 23 45 67 89 01 TTAAGGTGATGTGTGTGTAAAA SSSS | | | | | | TTAAGTAAAA 01 23 45 67 89

In the CIGAR, 'S' means soft clip; 'M' means match or mismatch.

Hope I explained clearly.

If the message "When maskLen < 15, the function ssw_align doesn't return 2nd best alignment information." causes trouble during your real usage, please pipe it into an error file. It is printed to the STDERR by the program.

Please feel free to contact me, if you need further help.

Many thanks,

Mengyao