Closed Tim-Yu closed 1 year ago
What version of the GTF are you using? Using v108 from Ensembl, I can confirm that ENST00000703342.1 is indeed on 8:
$ cat Homo_sapiens.GRCh38.108.chr.gtf | grep ENST00000703342
[..truncated..]
8 havana CDS 129842145 129842205 . - 1 gene_id "ENSG00000153310"; gene_version "22"; transcript_id "ENST00000703342"; transcript_version "1"; exon_number "14"; gene_name "CYRIB"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CYRIB-231"; transcript_source "havana"; transcript_biotype "protein_coding"; protein_id "ENSP00000515265"; protein_version "1"; tag "basic";
8 havana stop_codon 129842142 129842144 . - 0 gene_id "ENSG00000153310"; gene_version "22"; transcript_id "ENST00000703342"; transcript_version "1"; exon_number "14"; gene_name "CYRIB"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CYRIB-231"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "basic";
8 havana five_prime_utr 130016607 130016727 . - . gene_id "ENSG00000153310"; gene_version "22"; transcript_id "ENST00000703342"; transcript_version "1"; gene_name "CYRIB"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CYRIB-231"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "basic";
8 havana five_prime_utr 129970943 129970995 . - . gene_id "ENSG00000153310"; gene_version "22"; transcript_id "ENST00000703342"; transcript_version "1"; gene_name "CYRIB"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CYRIB-231"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "basic";
8 havana five_prime_utr 129904499 129904585 . - . gene_id "ENSG00000153310"; gene_version "22"; transcript_id "ENST00000703342"; transcript_version "1"; gene_name "CYRIB"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CYRIB-231"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "basic";
8 havana five_prime_utr 129903312 129903350 . - . gene_id "ENSG00000153310"; gene_version "22"; transcript_id "ENST00000703342"; transcript_version "1"; gene_name "CYRIB"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CYRIB-231"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "basic";
8 havana five_prime_utr 129880449 129880458 . - . gene_id "ENSG00000153310"; gene_version "22"; transcript_id "ENST00000703342"; transcript_version "1"; gene_name "CYRIB"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CYRIB-231"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "basic";
8 havana three_prime_utr 129839595 129842141 . - . gene_id "ENSG00000153310"; gene_version "22"; transcript_id "ENST00000703342"; transcript_version "1"; gene_name "CYRIB"; gene_source "ensembl_havana"; gene_biotype "protein_coding"; transcript_name "CYRIB-231"; transcript_source "havana"; transcript_biotype "protein_coding"; tag "basic";
FAM49B is empty:
$ cat Homo_sapiens.GRCh38.108.chr.gtf | grep FAM49B
I also could not locate it on the UCSC genome browser.
I am using GENECODE realse 19. I wonder if the ORF_ID is named after the start and end positions. Since chr8:130854336-130854428 is outside the annotated ENST00000703342.1
I want to get the ORF region at the genome level.
Thanks
No worries, it seems to be the GTF file is somehow shifted, thanks for your time.
Sorry, I am a bit confused. Using v19 gencode:
$ cat gencode.v19.annotation.gtf | grep FAM49B
[..truncated..]
chr8 HAVANA exon 131028853 131028898 . - . gene_id "ENSG00000153310.14"; transcript_id "ENST00000523514.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "FAM49B"; transcript_type "retained_intron"; transcript_status "KNOWN"; transcript_name "FAM49B-013"; exon_number 1; exon_id "ENSE00002100074.1"; level 2; havana_gene "OTTHUMG00000164805.3"; havana_transcript "OTTHUMT00000380405.1";
chr8 HAVANA exon 130982755 130983241 . - . gene_id "ENSG00000153310.14"; transcript_id "ENST00000523514.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "FAM49B"; transcript_type "retained_intron"; transcript_status "KNOWN"; transcript_name "FAM49B-013"; exon_number 2; exon_id "ENSE00002097805.1"; level 2; havana_gene "OTTHUMG00000164805.3"; havana_transcript "OTTHUMT00000380405.1";
chr8 HAVANA transcript 130982922 131028802 . - . gene_id "ENSG00000153310.14"; transcript_id "ENST00000518285.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "FAM49B"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "FAM49B-014"; level 2; havana_gene "OTTHUMG00000164805.3"; havana_transcript "OTTHUMT00000380406.1";
chr8 HAVANA exon 131028616 131028802 . - . gene_id "ENSG00000153310.14"; transcript_id "ENST00000518285.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "FAM49B"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "FAM49B-014"; exon_number 1; exon_id "ENSE00002118237.1"; level 2; havana_gene "OTTHUMG00000164805.3"; havana_transcript "OTTHUMT00000380406.1";
chr8 HAVANA exon 130982922 130983241 . - . gene_id "ENSG00000153310.14"; transcript_id "ENST00000518285.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "FAM49B"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "FAM49B-014"; exon_number 2; exon_id "ENSE00002101537.1"; level 2; havana_gene "OTTHUMG00000164805.3"; havana_transcript "OTTHUMT00000380406.1";
and there is no transcript with id ENST00000703342 :
$ cat gencode.v19.annotation.gtf| grep ENST00000703342
(no results)
Also there is no gene named CYRIB
:
$ cat gencode.v19.annotation.gtf| grep CYRIB
(no results)
Can you point me to the gtf and fasta files you used for creating the ribotricer index? Ideally these should be the same as what you used for mapping (using STAR or any other aligner).
I have figured out the problem, I did not use the same pointer for the fasta file in the workflow I generated. Thanks
Hi saketkc,
I hope all is well.
I want to trace back to the ORF coordinate on the genome, which I thought the ORF_ID and chrom should work. I thought ORF_ID is the combination of tx_id, start, end and the ORF length. But I found that the naming is not that way? may I ask what is the naming strategy for ORF_ID and where can I relocate the ORF coordinate? e.g.
ENST00000703342.1_1_130854336_130854428_93| overlap_dORF| translating | 0.8277447| 33| 93| 8| 0.25806452| 1.06451613| ENST00000703342.1_1| protein_coding| ENSG00000153310.22_12| CYRIB| protein_coding| chr8| -|
while the chr8:130854336-130854428 is FAM49B?![image](https://user-images.githubusercontent.com/25986548/213451077-15ab762b-2253-4b70-b39d-1feb69a6975b.png)
Many many thanks,
Tim