gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
203 stars 32 forks source link

gffcompare run on stringtie --merge gtf file results in zeros in the summary text #21

Closed adeslatt closed 7 years ago

adeslatt commented 7 years ago

Hello,

I have a problem of empty output for the Each set of paired end sequence data were aligned using hisat2 -- according to the published protocol for string tie. And then stringtie run on them with a command similar to:

stringtie -B -G Homo_sapiens.GRCh38.88.gtf rnaseq1.bam -o rnaseq1.gtf
stringtie -B -G Homo_sapiens.GRCh38.88.gtf rnaseq2.bam -o rnaseq2.gtf
...
stringtie -B -G Homo_sapiens.GRCh38.88.gtf rnaseqn.bam -o rnaseqn.gtf

putting these files in a text file I call stringtiefilelist

rnaseq1.gtf
rnaseq2.gtf
...
rnaseqn.gtf

I run stringtie merge as follows:

stringtie --merge  -G Homo_sapiens.GRCh38.88.gtf  -o stringtie_merged.gtf stringtiefilelist

Each of these steps produces expected output. But then when I run gffcompare

gffcompare -r gencode.v.25.annotation.gtf -o stringtie_merged.txt stringtie_merged.gtf

I get empty output so to speak

#= Summary for dataset: stringtie_merged.gtf
#     Query mRNAs :  238442 in   34534 loci  (238442 multi-exon transcripts)
#            (21005 multi-transcript loci, ~6.9 transcripts per locus)
# Reference mRNAs :  171986 in   33152 loci  (171986 multi-exon)
# Super-loci w/ reference transcripts:        0
#-----------------| Sensitivity | Precision  |
        Base level:     0.0     |     0.0    |
        Exon level:     0.0     |     0.0    |
      Intron level:     0.0     |     0.0    |
Intron chain level:     0.0     |     0.0    |
  Transcript level:     0.0     |     0.0    |
       Locus level:     0.0     |     0.0    |

     Matching intron chains:       0
       Matching transcripts:       0
              Matching loci:       0

          Missed exons:  544338/544338  (100.0%)
           Novel exons:  634910/634910  (100.0%)
        Missed introns:  350157/350157  (100.0%)
         Novel introns:  398915/398915  (100.0%)
           Missed loci:   33152/33152   (100.0%)
            Novel loci:   34534/34534   (100.0%)

Is there perhaps a problem in the parameters I am using?

Do I need to convert the gtf file to a gff file? A successful run I had used as input a gff file. Perhaps this is the reason.

gpertea commented 7 years ago

Does the chromosome naming convention match with the reference ? Because if not, that would be the best way to miss everything, like it seems to be the case here.

adeslatt commented 7 years ago

That was it!