mothur / mothur

Welcome to the mothur project, initiated by Dr. Patrick Schloss and his software development team in the Department of Microbiology & Immunology at The University of Michigan. This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.
www.mothur.org
GNU General Public License v3.0
261 stars 110 forks source link

align.seqs() of version 1.45.3 produces unexpectedly poor PairwiseAlignmentLength scores #785

Closed AdMePo closed 3 years ago

AdMePo commented 3 years ago

Hi,

I was using Mothur 1.45.3 to align ASV sequences againts a custom reference database. I was suprised of the poor PairwiseAlignmentLength scores obtained with align.seqs() (typically a dozen below 20, thousands at 0, and another dozen around -200) while this operation typically produced scores above 200 for most of the ASV with a previous version of Mothur.

As I cannot disclose the data I am working on currently, and in order to to double check before reporting a bug, I ran again the align.seqs() command as follows (mothur > align.seqs(candidate=abrecovery.fasta, template=core_set_aligned.imputed.fasta)) with data and instructions as provided on the Mothur wiki help page (https://mothur.org/wiki/align.seqs/). Again, PairwiseAlignmentLength scores (see attached logfile and report) were unexpectedly very poor compared to the results on the help page.

Finally, I reran my own analysis with Mothur 1.44.3. This time results were fully satisfying.

I thus conclude there may be a bug with align.seqs() of Mothur 1.45.3 that was not present in 1.44.3.

I hope I provided all that is needed to correct that putative bug, and I take advantage of that post to thank you for providing the Mothur software to the microbial ecology research community.

All the best,

Adrien MP

mothur.1627028480.logfile.txt abrecovery.align.report.txt abrecovery.align.txt

mothur-westcott commented 3 years ago

Thanks for reporting this bug. The *align.report file has two repeated column headers which is causing the confusion. SearchMethod and SearchScore are added to the headers line twice. Removing the duplicate headers, aligns the 12th column with PairwiseAlignmentLength with values in the 400's.

QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template

AY457885 504 356687 1366 kmer 70.42 needleman 62 504 1 439 444 1 5 1 70.27