hyattpd / Prodigal

Prodigal Gene Prediction Software
GNU General Public License v3.0
441 stars 85 forks source link

Differences in gene prediction for forward and reverse sequences #90

Open caballero opened 2 years ago

caballero commented 2 years ago

I am testing some gene prediction on metagenomic contigs, as the contigs are assembled, I don't have a sense of sequence direction (it could be the forward or the reverse sequence). I was testing what happens if I provide the sequence in either direction, in general, I see complete agreement on predictions, however, in 20-25% of my tests, I see differences if I provide the forward or reverse sequence.

For example:

$ echo -e ">test\nGGTCCGCCCGGCCCTCCGGGCCCTCCGGGCCCGCCGGGCCCGCCGACGCCGCCCACGCCGCCGTTGCTGCCCTCGCAGCCGGAGTTGCACCCGGTGCCGCCGGTGCCGCCGGTCTGGCCCTGGCCGCCCGCGCCGCCGGTGCCACCCTTGCCGCCAGCGCCCGAATAGCTGAGCAGACGAAGGGGTTCACGTTCGACGGGTCGAGACGCCGAGATTGACGATGTTACGAAGGTCATCTGGGTCAGGCCGGAATTGCCCGTTCCCGCCCGGGCGCCGGTCGTGGCGGCGGCGCCCGGGCCCCGGTTGGCCGCGATCACCGTGGAAGGTGAAGGTGCCCACCGGGAATCGCCGACCGGCCGTGCTCGTGCGCGCCCGTGACGCTCATGCGGGGCGCACCGGCGTACCGGGTCGTCCCCTGGGCCCCGGGTTGCCGACGATAGCCGAGATGGTAGATGTGCGAACGACGGATCTGGGGTTGGGCGTGATCAACCAGGTTTGTCGCCGTGATCGCAGCTGAGGTGGACATTGATCGTCAGAGACACCGCCGTCGAACACGAAGCGTGCCCGCGTTGATCGCATAGGACTGGGCGTTGGGGCCGATGAACTGATCACGTTGTCTGATGGGTCACGGTGGGCCGGAGGTGACAAGCCCAACCGTCACCAGGGGAACACCTGGTCCGCCAGCTGATGCAACCGCGTCCGAGAGCGCTCCTACGCCGAAAGACGTGATCCATGCAGGAGAAATGCATGTTCGGGATGGACCGTGTCGGCGCGAGGGCCGGTTCTCATGTAGGCGCGGCGCTGGTCGTCCGGTCGCCTTGGCCCGCGAAAGAGCTTAACGATACTGGCTGATCGAACATCGACGTGACTTCGCTGGTGATCGTCTCGCCCCGCCGACCAGCGGCGTCGGTGTCCTGACCCATAGCCTGGGCCGAGCCAGATAGTTGCATAACCGCGTCTCCCCTGTGAAGCTTATTGCCTGATCGTTCAGTGATTCGCGTGAGCATTGCAAGCGAATCTCAATTCGTCGTGGACGGTCAGGGGCGGCGCTGTCCGTAGCGGTTGGAACGCGCTCGAAACGAGTATCGGCCGGGTTCGCAATCGTGGGCGGGGGCCAGCGCTCCGCGCGCAAGGGAGGGCTCGGGGAGGGGGGGCGAAGCGGGGAGAGGGGCTGCGCGATAGCGCCTGTTCGGCAGAAGACAGACCGAAACACTAGCGAACGCGCCGCTCTCGCCGAAAAGATTGCACGCAACGTCGATGGAGGCAATCGCGATAGCAGCAAACATAAGAAGCTCGAAGGCATCGCGGATTTGAAGGCGTCGCGCGCGAAGGCTGCTGGATGCACGTGCGCGCCTGCGCGAGCTGGCTCTGCGCCCATCACTATGGTGCCCGACGTGCGGTCCCCGACTATCTCCGCCGCCGCGGCGCATCGCCAATCGCTGGCCGTCGCTCGCCAGTGCGCAGTCGGACGAGCGATCAGCGCGTTCATTCGGAACGCCGTGTCCGCCAGGGCCGAGGGAAATGGAACTCGCGTGCAGCGCGGCGATATCTGGACCGTTTCGGGCGGCAGGGACTACGCGGGCAAGCCGCGTCCCGTCGTCATCGTCCAGGATGATAGTTTCGACATGACGTACTCCGTCACCATCTGCGCCTTCACCACCGACACGACCGACGCGCCGCTGTTTCGCCT" | ./prodigal -p meta 
-------------------------------------
PRODIGAL v2.6.3 [February, 2016]         
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.     
-------------------------------------
Request:  Metagenomic, Phase:  Training
Initializing training files...done!
-------------------------------------
Request:  Metagenomic, Phase:  Gene Finding
Finding genes in sequence #1 (1690 bp)...done!
DEFINITION  seqnum=1;seqlen=1690;seqhdr="test";version=Prodigal.v2.6.3;run_type=Metagenomic;model="31|Natrialba_magadii_ATCC_43099|A|61.4|11|0";gc_cont=61.40;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             complement(<3..317)
                     /note="ID=1_1;partial=10;start_type=GTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.765;conf=98.18;score=17.34;cscore=21.22;sscore=-3.88;rscore=0.00;uscore=-3.46;tscore=-5.18;"
     CDS             complement(836..1489)
                     /note="ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.639;conf=90.25;score=9.68;cscore=6.45;sscore=3.23;rscore=0.00;uscore=-1.00;tscore=4.88;"
     CDS             1533..>1688
                     /note="ID=1_3;partial=01;start_type=GTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.641;conf=83.11;score=6.93;cscore=16.68;sscore=-9.75;rscore=0.00;uscore=-4.57;tscore=-5.18;"
//

But if I provide the reverse complement sequence, the predictions are missing 1 sequence and one sequence is shorter:

$ echo -e ">test\nGGTCCGCCCGGCCCTCCGGGCCCTCCGGGCCCGCCGGGCCCGCCGACGCCGCCCACGCCGCCGTTGCTGCCCTCGCAGCCGGAGTTGCACCCGGTGCCGCCGGTGCCGCCGGTCTGGCCCTGGCCGCCCGCGCCGCCGGTGCCACCCTTGCCGCCAGCGCCCGAATAGCTGAGCAGACGAAGGGGTTCACGTTCGACGGGTCGAGACGCCGAGATTGACGATGTTACGAAGGTCATCTGGGTCAGGCCGGAATTGCCCGTTCCCGCCCGGGCGCCGGTCGTGGCGGCGGCGCCCGGGCCCCGGTTGGCCGCGATCACCGTGGAAGGTGAAGGTGCCCACCGGGAATCGCCGACCGGCCGTGCTCGTGCGCGCCCGTGACGCTCATGCGGGGCGCACCGGCGTACCGGGTCGTCCCCTGGGCCCCGGGTTGCCGACGATAGCCGAGATGGTAGATGTGCGAACGACGGATCTGGGGTTGGGCGTGATCAACCAGGTTTGTCGCCGTGATCGCAGCTGAGGTGGACATTGATCGTCAGAGACACCGCCGTCGAACACGAAGCGTGCCCGCGTTGATCGCATAGGACTGGGCGTTGGGGCCGATGAACTGATCACGTTGTCTGATGGGTCACGGTGGGCCGGAGGTGACAAGCCCAACCGTCACCAGGGGAACACCTGGTCCGCCAGCTGATGCAACCGCGTCCGAGAGCGCTCCTACGCCGAAAGACGTGATCCATGCAGGAGAAATGCATGTTCGGGATGGACCGTGTCGGCGCGAGGGCCGGTTCTCATGTAGGCGCGGCGCTGGTCGTCCGGTCGCCTTGGCCCGCGAAAGAGCTTAACGATACTGGCTGATCGAACATCGACGTGACTTCGCTGGTGATCGTCTCGCCCCGCCGACCAGCGGCGTCGGTGTCCTGACCCATAGCCTGGGCCGAGCCAGATAGTTGCATAACCGCGTCTCCCCTGTGAAGCTTATTGCCTGATCGTTCAGTGATTCGCGTGAGCATTGCAAGCGAATCTCAATTCGTCGTGGACGGTCAGGGGCGGCGCTGTCCGTAGCGGTTGGAACGCGCTCGAAACGAGTATCGGCCGGGTTCGCAATCGTGGGCGGGGGCCAGCGCTCCGCGCGCAAGGGAGGGCTCGGGGAGGGGGGGCGAAGCGGGGAGAGGGGCTGCGCGATAGCGCCTGTTCGGCAGAAGACAGACCGAAACACTAGCGAACGCGCCGCTCTCGCCGAAAAGATTGCACGCAACGTCGATGGAGGCAATCGCGATAGCAGCAAACATAAGAAGCTCGAAGGCATCGCGGATTTGAAGGCGTCGCGCGCGAAGGCTGCTGGATGCACGTGCGCGCCTGCGCGAGCTGGCTCTGCGCCCATCACTATGGTGCCCGACGTGCGGTCCCCGACTATCTCCGCCGCCGCGGCGCATCGCCAATCGCTGGCCGTCGCTCGCCAGTGCGCAGTCGGACGAGCGATCAGCGCGTTCATTCGGAACGCCGTGTCCGCCAGGGCCGAGGGAAATGGAACTCGCGTGCAGCGCGGCGATATCTGGACCGTTTCGGGCGGCAGGGACTACGCGGGCAAGCCGCGTCCCGTCGTCATCGTCCAGGATGATAGTTTCGACATGACGTACTCCGTCACCATCTGCGCCTTCACCACCGACACGACCGACGCGCCGCTGTTTCGCCT\n" | perl -pe 'unless (/>/) { $_= reverse $_; tr/ACGT/TGCA/}' | ./prodigal -p meta 
-------------------------------------
PRODIGAL v2.6.3 [February, 2016]         
Univ of Tenn / Oak Ridge National Lab
Doug Hyatt, Loren Hauser, et al.     
-------------------------------------
Request:  Metagenomic, Phase:  Training
Initializing training files...done!
-------------------------------------
Request:  Metagenomic, Phase:  Gene Finding
Finding genes in sequence #1 (1690 bp)...done!
DEFINITION  seqnum=1;seqlen=1690;seqhdr="test";version=Prodigal.v2.6.3;run_type=Metagenomic;model="31|Natrialba_magadii_ATCC_43099|A|61.4|11|0";gc_cont=61.40;transl_table=11;uses_sd=0
FEATURES             Location/Qualifiers
     CDS             complement(<3..308)
                     /note="ID=1_1;partial=10;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.670;conf=96.25;score=14.12;cscore=13.17;sscore=0.95;rscore=0.00;uscore=-3.28;tscore=4.88;"
     CDS             1374..>1688
                     /note="ID=1_2;partial=01;start_type=GTG;rbs_motif=None;rbs_spacer=None;gc_cont=0.765;conf=98.18;score=17.34;cscore=21.22;sscore=-3.88;rscore=0.00;uscore=-3.46;tscore=-5.18;"
//