gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
198 stars 32 forks source link

no refmap/tmap generated from gffcompare? #45

Closed yejilee-biostat closed 4 years ago

yejilee-biostat commented 4 years ago

Hello,

I've run gffcompare with the following command: gffcompare -r reference.gtf -R -C -K -o sample1 query.gff

It generated sample1.combined.gtf, sample1.redundant.gtf, sample1.tracking, sample1.loci, sampe1.stats . However, according to this page, shouldn't I also get sample1.refmap and sample1.tmap? I can't find them, even when I drop "-C" and "-K" option.

The reference gtf file I used looks like this:

chr1 BestRefSeq exon 11874 12227 . + . transcript_id "NR_046018"; gene_id "100287102"; chr1 BestRefSeq exon 12613 12721 . + . transcript_id "NR_046018"; gene_id "100287102"; chr1 BestRefSeq exon 13221 14409 . + . transcript_id "NR_046018"; gene_id "100287102"; chr1 BestRefSeq exon 14362 14829 . - . transcript_id "NR_024540"; gene_id "653635";

and the sample1.gff looks like this:

chr1 pinfish mRNA 14421 195419 0 - . gene_id "847e65b5-7e7b-428b-936e-6a376f435050"; transcript_id "1afec315-0f47-448d-bcf1-fee886bdfd83|3"; chr1 pinfish exon 14421 14829 0 - . transcript_id "1afec315-0f47-448d-bcf1-fee886bdfd83|3"; chr1 pinfish exon 14970 15038 0 - . transcript_id "1afec315-0f47-448d-bcf1-fee886bdfd83|3"; chr1 pinfish exon 186317 186469 0 - . transcript_id "1afec315-0f47-448d-bcf1-fee886bdfd83|3";

Should I modify my reference or query file to get the refmap/tmap? Any advice would be really helpful!

gpertea commented 4 years ago

The confusion is understandable - the .refmap and .tmap files are actually created as query.gff.refmap and query.gff.tmap (denoted as <gff_in>.refmap and <gff_in>.tmap in the documentation). These 2 files are also the only ones which may not be found in the current working directory but in whatever directory the query.gff file is located (when given with a full/relative path).

I realize this does not make much sense when a single query.gff file is provided, but the initial focus of gffcompare (or rather of its predecessor, cuffcompare) was on processing multiple query files (one from each sample) simultaneously, and a .refmap and a .tmap file has to be created for each query file accordingly -- so they cannot be created with the same name/prefix in the current working working directory like the other gffcompare output files.

yejilee-biostat commented 4 years ago

Oh, I see - I can find these files in the directory having "query.gff", as you said. Thanks for the quick reply!

wuzengding commented 2 years ago

why the results of class_code generate by gffcompare are only "class_code 'u' " ? the run bash like this /mnt/data2/wuzdxxx/03.tools/gffcompare-0.12.6.Linux_x86_64/gffcompare -r /mnt/data2/wuzdxxx/00.reference/hg38.refGene.gtf -o refGene.compare /mnt/data2/wuzdxxx/01.dataset_analysis/09.COLLAPSE/m64014_190506_005857.gff > /mnt/data2/wuzdxxx/01.dataset_analysis/11.GFFCOMPARE/refgffcompare.txt and the rusult like this: 00000236601"; gene_name "ENSG00000236601"; xloc "XLOC_000010"; class_code "u"; tss_id "TSS12"; 00000236601"; gene_name "ENSG00000236601"; xloc "XLOC_000010"; class_code "u"; tss_id "TSS12"; 00000269732"; gene_name "ENSG00000269732"; xloc "XLOC_000011"; class_code "u"; tss_id "TSS13"; 00000233653"; gene_name "ENSG00000233653"; xloc "XLOC_000012"; class_code "u"; tss_id "TSS14"; 00000235146"; gene_name "ENSG00000235146"; xloc "XLOC_000013"; class_code "u"; tss_id "TSS15"; 00000235146"; gene_name "ENSG00000235146"; xloc "XLOC_000013"; class_code "u"; tss_id "TSS15"; 00000225972"; gene_name "ENSG00000225972"; xloc "XLOC_000014"; class_code "u"; tss_id "TSS16"; 00000225630"; gene_name "ENSG00000225630"; xloc "XLOC_000015"; class_code "u"; tss_id "TSS17";