tseemann / barrnap

:microscope: :leo: Bacterial ribosomal RNA predictor
GNU General Public License v3.0
221 stars 40 forks source link

GFF output and Fasta-headers give different start-coordinates of rRNA-genes #53

Open jvollme opened 3 years ago

jvollme commented 3 years ago

Barrnap v.0.9 produces gff-output and (optionally) a fasta output. The fasta output has the coordinates of each rRNA prediciton in the header, but not the evalue of that prediction. The gff output has also the evalue.

I now noticed that the start positions given in the fasta headers differ from the start positions given in the gff-output (usually by a value of 1). For me this is a bit of a problem, because in order to catch any possible variation of rRNA genes in metagenomic bins, I am running barrnap runs for all three kingdoms (bac, arc & euk) consecutively and then try to identify overlapping hits and keep only the highest scoring (i.e. lowest evalue) hit for each overlapping possibility. This means I have to compare the gff output (in order to get the evalues) with the fasta-headers.

Is this difference perhaps a bug or is it due to some special gff-specifications? Can i safely assume that it is off by exactly 1 in ALL cases in order to correct for this difference, or could it be a bit more problematic?

Alternatively, it would be most helpful to either add the corresponding fasta seqid to the gff-output, or the evalue to the fasta-header.