hyattpd / Prodigal

Prodigal Gene Prediction Software
GNU General Public License v3.0
441 stars 85 forks source link

strand orientation #105

Open mmpust opened 1 year ago

mmpust commented 1 year ago

Hi, I am running prodigal meta on a gene catalogue in FASTA format to get the GFF and GTF file.

prodigal -f gff -p meta \
 -i $REF \
 -o ${REF%.fna}.gff

The faster headers look like this:

>lcl|CP000538.1_cds_EAQ71949.1_1 [gene=dnaA] [locus_tag=CJJ81176_0027] [protein=chromosomal replication initiator protein DnaA] [protein_id=EAQ71949.1] [location=1..1323] [gbkey=CDS]
>lcl|CP000538.1_cds_EAQ72022.1_33 [locus_tag=CJJ81176_0064] [protein=cytochrome c family protein] [protein_id=EAQ72022.1] [location=47275..49344] [gbkey=CDS]

The final GFF file after completion looks like this:

# Sequence Data: seqnum=1;seqlen=1323;seqhdr="lcl|CP000538.1_cds_EAQ71949.1_1 [gene=dnaA] [locus_tag=CJJ81176_0027] [protein=chromosomal replication initiator protein DnaA] [protein_id=EAQ71949.1] [location=1..1323] [gbkey=CDS]"
# Model Data: version=Prodigal.v2.6.3;run_type=Metagenomic;model="39|Rickettsia_conorii_Malish_7|B|32.4|11|1";gc_cont=32.40;transl_table=11;uses_sd=1
lcl|CP000538.1_cds_EAQ71949.1_1 Prodigal_v2.6.3 CDS     1       1323    168.6   +       0       ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.302;conf=100.00;score=168.58;cscore=165.36;sscore=3.22;rscore=0.00;uscore=0.00;tscore=3.22;

I was wondering how prodigal is assigning the strand orientation? If I look into the corresponding genomes from which the gene catalogue was generated, the strand orientation is "-" and not "+". So, should I update the output GFF file with the original strand information or which algorithm is prodigal using to infer this? Thanks!