gpertea / gffcompare

classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
MIT License
198 stars 32 forks source link

switched to gene_id instead of gene_name in getGeneID by default #51

Closed alevar closed 4 years ago

alevar commented 4 years ago

Since gene_id attribute is unique and is more likely to be present in the GTF files I found it more useful to report by default in the ref_gene_id field in the refmap and tmap files. When the default was set to use gene_name in that column, it made it particularly difficult and sometimes impossible to use the value for referencing without also using the transcript_id. Additionally the column is named "ref_gene_id", so this behavior seems like a more expected one.

Also, on the webpage documentation for gffcompare the .refmap section lists gene_name as the first column, but that column is named gene_id. I think either the website description needs to be corrected, or the code

gpertea commented 4 years ago

About the .refmap file, I actually wanted to prioritize a gene name/symbol if present -- my impression was that biologists looking at the data prefered those more explicit gene names instead of the dry gene IDs. I amended the web page and the table header to only specify ref_gene (but still preferring gene name if present).

alevar commented 4 years ago

That makes sense - I wasn't sure if the gene_name attribute was preferred for a reason, and it seemed a little conflicting with the header and the name of the function.