zhpn1024 / ribotish

Ribo-seq TIS Hunter, predicting translation initiation sites and ORFs using riboseq data
http://dx.doi.org/10.1038/s41467-017-01981-8
GNU General Public License v3.0
27 stars 8 forks source link

The meaning and differences in TisType #27

Open ruixuan-zhang opened 2 years ago

ruixuan-zhang commented 2 years ago

Dear developer,

Good day. Thank you for your development and maintenance of this software.

I was wondering if you could explain about the definitions of different classes of TisType?

I see in README that TisType refers to the relative position of the TIS to annotated ORF of the transcript.

First, in my results, I got some predictions like 3' UTR, 5'UTR and Extended.

Second, I also got some Internal and Internal:CDSFrameOverlap

In the end, I am working on a virus genome with a high coding density. What if a predicted ORF, started in the upstream gene's CDS or 3'UTR region and ends in the downstream genes' CDS region in a different frame. What will the TisType be? Is that Novel or 3'UTR?

Thank you very much in advance!! Ruixuan

zhpn1024 commented 2 years ago

Dear developer,

Good day. Thank you for your development and maintenance of this software.

I was wondering if you could explain about the definitions of different classes of TisType?

I see in README that TisType refers to the relative position of the TIS to annotated ORF of the transcript.

First, in my results, I got some predictions like 3' UTR, 5'UTR and Extended.

  • Can I understand the class Extended in a way that if an assembled transcript from RiboSeq data is aligned to the annotated CDS region and the transcript is continuous without frameshift and extends outside of the annotated CDS, it is annotated as extended. Yes, without frameshift or stop codon, resulted in an extended form of annotated CDS.
  • While the 5'UTR and 3'UTR means that the TIS of a transcript is aligned to these untranslated regions and not assembled into the transcript of the CDS part (or not in the same frame)? The ORF of 5'UTR type may have some overlap with annotated CDS, but not in the same frame.

Second, I also got some Internal and Internal:CDSFrameOverlap

  • I see CDSOverlap means the ORF overlaps with annotated CDS in another transcript in the same reading frame.
  • Does Internal mean that a predicted ORF

    • locates within an annotated CDS (both ends locate within the annotated one) Only consider the TIS position, not necessary both ends within annotated CDS.
    • is in different frame Right.
  • Does internal:CDSFrameOverlap means a predicted ORF locates within an annotated but in the same frame? The predicted one is not in the same frame with annotated CDS, but in the same frame with CDS in another transcript.

In the end, I am working on a virus genome with a high coding density. What if a predicted ORF, started in the upstream gene's CDS or 3'UTR region and ends in the downstream genes' CDS region in a different frame. What will the TisType be? Is that Novel or 3'UTR? Just based on the start position. It should be 5'UTR if started in the upstream. 'Novel' means the transcript has no CDS annotation. So it depend on annotation of the given transcript.

Thank you very much in advance!! Ruixuan

ruixuan-zhang commented 2 years ago

Thank you for your prompt reply! Now I understand better about TisType.

I was wondering if you could help me with one more question that

If a predicted ORFs, its TIS is aligned to the annotated TIS, but the end of this predicted ORFs extended outside of the annotated one. For example, stop codon recoding events or stop codon bypass events, what will be the TisType of this case?

My preliminary guess is that the TisType is Annotated and extension at 3'end can be found by comparing GenomePos or Start:Stop with the CDS region in the previous annotation file, right?

Thank you very much!

zhpn1024 commented 2 years ago

The type should be 'Annotated'. The prediction of 3' end extention is not supported currently. This may happen in case of different stop codons. If so, you can compare the predicted stop position with the annotated one. In addition, the 3' extended CDS region may be identified as another 3'UTR ORF, if there is a TIS codon in it.

ruixuan-zhang commented 2 years ago

Thank you very much for your patient explanation!

ruixuan-zhang commented 2 years ago

Dear Zhang,

Good day. Sorry, I have another question about the meaning of "GenomePos" and "Start & Stop".

In the README file, it is written as

Can I understand in a way that

By the way I want to ask if truncated represents cases whose predicted start codon is in the downstream region of annotated start codon and leads the same f0 frame?

I asked this because I got a result like below.

Screen Shot 2022-10-04 at 16 58 34

Thank you very much in advance.

Ruixuan

zhpn1024 commented 2 years ago

The 'Truncated' should be what you suppose to be. For the example, could you provide the detailed information including Transcript, CDS, GenomePos, Start and Stop? The start and stop are relative to the 5' end of transcript (usually 5'UTR), and corresponding to the two positions of GenomePos.

zhpn1024 commented 2 years ago

A new module 'transplot' is added in the github but not formally released. You can git clone and try to plot using 'ribotish transplot' with '--morecds' option.

ruixuan-zhang commented 2 years ago

Yeah, sure

I found my mistakes in plotting that I forgot to consider the strand information.

Then, it makes sense in this case, the start part has a truncation. GenomePos represents the start codon : stop codon predicted by RiboTISH right?

I was wondering how can I know where the transcript starts? Does RiboTISH use and follow the annotation in gff file?

Thank you!

zhpn1024 commented 2 years ago

Right. The start site is from transcript annotation in gff file.

ruixuan-zhang commented 2 years ago

Thank you very much for your patient explanation!