jeffdaily / parasail

Pairwise Sequence Alignment Library
Other
241 stars 34 forks source link

sample data sets for testing, using tests, and validation #2

Closed colinbrislawn closed 8 years ago

colinbrislawn commented 9 years ago

Hello Jeff,

How do I run tests in the folder 'tests'? https://github.com/jeffdaily/parasail/tree/master/tests How do I know if the tests are successful?

Are there tests for the example applications with a known, correct answer?

Thank you! Colin

jeffdaily commented 9 years ago

First matter is whether there are sample data sets. Yes and no. I started a 'data' directory. Put some uniprot single-sequence protein accession fasta files in there. They represent the same sequences Rognes used in the SWIPE performance benchmarks.

I don't have any other test data stored in this git repo. But I have been using a file named 'bacteria.140.protein.faa' that I got from RefSeq.

The only validation performed is a cross validation of any of the SIMD implementations against their reference serial implementation -- that I also wrote. I see a problem there.

Tests that compare parasail results against a known, correct answer is exactly what I need.

jeffdaily commented 9 years ago

I forgot to tell you how to run the test programs.

'test_verify' will compare the three serial reference implementations against all SIMD implementations, for all 65 provided substitution matrices and a handful of affine gap penalties. Only final scores are compared.

'test_verify_tables' is like 'test_verify' but it will compare the dynamic programming tables rather than just the final scores. Great for debugging.

'test_align' takes a fasta file and two sequence indexes within that file and performs every alignment function implemented within parasail. For functions returning the full dynamic programming table, it gets written to a text file. Great for running a visual diff tool to debug exactly where DP tables were wrong.

'test_openmp' and 'test_query' should be discarded. These tests eventually turned into the parasail_aligner app.

'test_ssw' is the same as the parasail_aligner app but will only use the SSW library routines from Mengyao et al. This is probably a great starting place to verify parasail against a "known" solution. Wish I had thought of that sooner.

To run these tests, most of them accept a fasta database file with the '-f' parameter. Other parameters will need to be discovered by reading the source files and looking for the getopt call. Sorry that wasn't more helpful.