gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
198 stars 32 forks source link

0 reference transcripts loaded. #52

Closed nattzy94 closed 4 years ago

nattzy94 commented 4 years ago

Hi,

I am trying to use gffcompare to compare my assembled transcriptome to a reference gtf that contains information about small open reading frames (sORFs). The reference gtf was obtained by processing downloaded data from several sORF databases. The reference gtf is called all_38.gtf and is attached at the end.

mytranscriptome.gtf is generated with reference to ensembl hg38 v99 reference.

I used the command: /home/e0470749/gffcompare/gffcompare -r /gpfs/eplab/Nathaniel/norf_replicate/all_38.gtf -o /gpfs/eplab/Nathaniel/norf_replicate/norfs mytranscriptome.gtf.

This outputs a message file:

0 reference transcripts loaded.
  237788 query transfrags loaded.
  2714 duplicate query transfrags discarded.

All the expected output files are generated except for the .refmap file which I need for downstream analysis. I assume this is because 0 reference transcripts were loaded into the program. Is there something wrong with my custom reference gtf file?

all_38.gtf.zip

gpertea commented 4 years ago

Sorry for the terribly late reply, if you are still looking into this issue, here it is the solution the gtf file you have there only has "gene" features, not transcripts are found because there are no exons defined there. You could use the gffread utility with the --gene2exon option in order to convert that file to a transcript file that is understood by gffcompare.

gffread all_38.gtf --gene2exon -o all_38.gff

..then use the resulting all_38.gff with gffcompare instead of your original all_38.gtf

nattzy94 commented 4 years ago

Hi Geo, thanks for the reply. I used the gene2exon function as suggested and the gtf file now correctly contains exon features. However, now I wish to use the gtf file for RNAseq analysis. In particular, I would like to look at transcript level expression.

I am wondering if the transcript level expression can be quantified using the new version of the gtf since the gtf now only contains gene and exon features. Is there a similar gene2transcript function that will allow me to add in transcript level information?

Thanks very much for your help!

gpertea commented 4 years ago

Most GFF parsers should be able to understand that file with just gene and exon features and take the genes as transcripts. Also, if you add -T to the gffread command I sent above you'll get an output GTF with transcript and exon features.

If you still want to use GFF instead of GTF, there is no such gene2transcript option for gffread, but in the case of your file (and in general for prokariotus) this could be accomplished easily on the resulting GFF with a simple search and replace command, e.g. like this:

sed -e 's/\tgene\t/\ttranscript\t/' all_38.gff > all_38.transcripts.gff