Closed carajj closed 6 months ago
Hi @carajj
Thanks for the question. ORF_ID is:
transcriptid_firststart_lastend_totallength
, where transcriptid
is the transcript_id
from your ensemble gtf, firststart
is the start of the first exon within this ORF, lastend
is the end of the last exon part of this ORF and totallength
is the number of nucleotides spanning this ORF
You can extract the chromosome location and transcript location from the ribotricer index: Just look up the row with the ORF_ID and extract columns chromosome
and coordinates
: https://github.com/smithlabcode/ribotricer/blob/34bcf7f7c4a19e42b5225641e5eec638376d1eb2/ribotricer/prepare_orfs.py#L357
Hope this helps! Please feel free to reopen with any follow up questions.
Hello,
I have run Ribotricer version 1.3.3 and created the index using Gencode v35. Below is the result from the test1_translating_ORFs.tsv file:
What does the ORF_ID represent? Is it composed of the transcript ID, transcript start, transcript end, and ORF length in four parts? How can I obtain the start and end positions of the ORF in the genome? Is there any information that can suggest whether the ORF is located in the intergenic region of the genome?
Thanks a lot,