gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
198 stars 32 forks source link

Why not two '=' #46

Open wangshun1121 opened 4 years ago

wangshun1121 commented 4 years ago

This is my gtf file for test, saved as 'Test.gtf':

1       StringTie       transcript      9182204 9192089 1000    +       .       gene_id "origin.STRG.178"; transcript_id "origin.STRG.178.1"; reference_id "ENST00000412639"; ref_gene_id "ENSG00000234546"; ref_gene_name "LINC01759"; cov "3.691087"; FPKM "0.354073"; TPM "0.688700";
1       StringTie       exon    9182204 9182316 1000    +       .       gene_id "origin.STRG.178"; transcript_id "origin.STRG.178.1"; exon_number "1"; reference_id "ENST00000412639"; ref_gene_id "ENSG00000234546"; ref_gene_name "LINC01759"; cov "0.261039";
1       StringTie       exon    9183633 9183807 1000    +       .       gene_id "origin.STRG.178"; transcript_id "origin.STRG.178.1"; exon_number "2"; reference_id "ENST00000412639"; ref_gene_id "ENSG00000234546"; ref_gene_name "LINC01759"; cov "4.634286";
1       StringTie       exon    9191393 9192089 1000    +       .       gene_id "origin.STRG.178"; transcript_id "origin.STRG.178.1"; exon_number "3"; reference_id "ENST00000412639"; ref_gene_id "ENSG00000234546"; ref_gene_name "LINC01759"; cov "4.010363";
1       StringTie       transcript      9182752 9182952 1000    .       .       gene_id "origin.STRG.179"; transcript_id "origin.STRG.179.1"; cov "6.796020"; FPKM "0.651919"; TPM "1.268033";
1       StringTie       exon    9182752 9182952 1000    .       .       gene_id "origin.STRG.179"; transcript_id "origin.STRG.179.1"; exon_number "1"; cov "6.796020";

I typed following command:

gffcompare -r Test.gtf Test.gtf 

and this is gffcmp.tracking:

TCONS_00000001  XLOC_000001     origin.STRG.178|origin.STRG.178.1       =       q1:origin.STRG.178|origin.STRG.178.1|3|0.354073|0.688700|3.691087|985
TCONS_00000002  XLOC_000002     origin.STRG.178|origin.STRG.178.1       i       q1:origin.STRG.179|origin.STRG.179.1|1|0.651919|1.268033|6.796020|201

See, I compare this file with itself, and two transcripts were not marked two '=', instead, origin.STRG.179 was marked i compared with origin.STRG.178. Is there any explanation?

gpertea commented 4 years ago

Yes, the explanation is that "reference" transcripts (in the file given with -r option) cannot be unstranded (as is origin.STRG.179.1 in your example), so such reference transcripts will be discarded before the comparison is performed. Running gffcompare with -V option should give more information (perhaps too much) about how the input and reference data are processed (and discarded/adjusted as it may be the case).

Btw thanks to your message I just fixed a bug in the -V output, in the warning message about discarding unstranded reference transcripts.