soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
547 stars 135 forks source link

Validation/regression tests for hh-suite. #31

Closed sowani closed 8 years ago

sowani commented 8 years ago

Hi,

Since I am done with the initial port of hh-suite to ppc64le and that I am able to run a few commands mentioned in the User Guide without crashing hh-suite, I am looking for testing the ported code for validity/accuracy. As I am a lay programmer without any knowledge about DNA/Proteins or anything about it. it is tough for me to know if the output I got is correct or not. The crash-less execution of hh-suite makes me happy, but I am totally unaware of the state of the result it produces.

At present I am planning to set up hh-suite on x86_64 machine, run test commands (again picked up from the User Guide) and generate a baseline to test hh-suite. Then by generating similar output on ppc64le, I will compare it with x86 results. However I am sure that this can not be claimed as "tested and validated" port.

I did not find any test cases packaged along with the source code. What could be the best way to validate the port?

Thanks, Atul.

martin-steinegger commented 8 years ago

Congratulations. :)

Since you can expect the same result as the x86_64 implementation. I would pick 100 random proteins from the Uniprot (I can do this for you if you are not familiar with the Uniprot) and run the both HHblits versions (x86 and ppc64le) and diff the results. Be aware that using the ppc64le SIMD float units might result in slight differences in the score (+- 0.1).

sowani commented 8 years ago

@martin-steinegger Thanks for your help! Could you please send me results generated on x86 for 100 random proteins? If the resulting data is going to be too big to transfer, will it be possible for you to create a script which I can execute in my x86 environment and generate the results locally?

martin-steinegger commented 8 years ago

I picked 100 random sequences from the Unprot database and wrote a small script to generate for each of the 100 sequences a hhr and a A3M file. Just call this script for the x86 and the ppc64le version and diff the results. Please let me know if you have any questions. 100randsequences.zip

sowani commented 8 years ago

@martin-steinegger Thanks for the sequences and the script. I am using this data to generate the baseline on x86. I modified the run_benchmark.sh script slightly to tidy-up /tmp slightly. Here is the diff:

*** a/run_benchmark.sh 2016-11-20 17:10:44.000000000 +0530 --- b/run_benchmark.sh 2016-11-21 18:30:37.292000000 +0530 *************** *** 1,7 **** #!/bin/bash CPU=1 ! tmpdir="/tmp" IT=1 for seq in $(seq 1 100); do awk -v line="$seq" '/>/{i++}i==line{print; next; print; exit}' 100.random.seq > $tmpdir/${seq}.fasta hhblits -i $tmpdir/${seq}.fasta -d $DB -n $IT -oa3m $tmpdir/${seq}.a3m -o $tmpdir/${seq}.hhr -cpu $CPU --- 1,11 ---- #!/bin/bash CPU=1 ! tmpdir="/tmp/hh-results" IT=1 + DB=/root/hhsuite-3.0.1-Linux/dbs/scop70_1.75 + if [ ! -d /tmp/hh-results ]; then + mkdir -p $tmpdir + fi for seq in $(seq 1 100); do awk -v line="$seq" '/>/{i++}i==line{print; next; print; exit}' 100.random.seq > $tmpdir/${seq}.fasta hhblits -i $tmpdir/${seq}.fasta -d $DB -n $IT -oa3m $tmpdir/${seq}.a3m -o $tmpdir/${seq}.hhr -cpu $CPU

Thanks! Atul.

sowani commented 8 years ago

The script executed successfully. I timed the script and got following results on my x86 VM: real 6m26.182s user 6m19.844s sys 0m5.688s I am closing this issue now. With this as baseline I will start checking the ppc64le port now.

BTW, a suggestion - could you please include these 2 files (i.e. the contents of 100randomsequences.zip) along with hh-suite source code so that there will be a readymade test suite available for the users.

Thanks, Atul.

lydonchandra commented 1 year ago

hi @sowani , how are you ? did you end up adding more validation / regression tests ?