marbl / CHM13

The complete sequence of a human genome
Other
920 stars 99 forks source link

SF3B3 gene annotation missing in gff3 #74

Closed fbrundu closed 1 year ago

fbrundu commented 1 year ago

I am interested on the SF3B3 gene, and I was able to find the entry in the gff3 file for hg38 (e.g., gencode 43):

chr16   HAVANA  gene    70523791    70577670    .   +   .   ID=ENSG00000189091.13;gene_id=ENSG00000189091.13;gene_type=protein_coding;gene_name=SF3B3;level=1;hgnc_id=HGNC:10770;tag=ncRNA_host;havana_gene=OTTHUMG00000137582.8

However, I cannot find the entry with type "gene" (on third column) for such gene in the gff3 posted on this repository. Other annotations for this gene, e.g., "transcript", are available:

CDS
exon
intron
start_codon
stop_codon
transcript

What could be the reason for this, and would there be a work-around to this issue?

diekhans commented 1 year ago

It is unclear why a gene record did not get written for this. An issue was created for CAT, however, the software is between maintainers. You could generate the gene records yourself. Also, Ensembl, has chm13 annotations in their HPRC release. NCBI has also released CHM13 annotations

https://github.com/ComparativeGenomicsToolkit/Comparative-Annotation-Toolkit/issues/286

arangrhie commented 1 year ago

Hello, just to add on this.

SF3B3 is found in the curated RefSeq/Liftoff annotation as following:

chr16   Liftoff gene    76334997        76388857        .       +       .       ID=SF3B3;gene_name=SF3B3;db_xref=MIM:605592;description=splicing factor 3b subunit 3;gbkey=Gene;gene=SF3B3;gene_biotype=protein_coding;gene_synonym=STAF130;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=SF3B3_0