zhpn1024 / ribotish

Ribo-seq TIS Hunter, predicting translation initiation sites and ORFs using riboseq data
http://dx.doi.org/10.1038/s41467-017-01981-8
GNU General Public License v3.0
27 stars 8 forks source link

Feature request: Ribo-seq counts in the results #11

Open nsotta opened 4 years ago

nsotta commented 4 years ago

Thank you for the great tool! It seems to be working perfectly for me but I would like to request a feature implementation. Would it be possible to add a column of in-frame Ribo-seq (not TI-seq) count per ORF to the result of ribotish predict, like TISCount column for TI-seq? I believe this would be useful from the following viewpoints (I'm detecting ORFs using Ribo-seq only, without TI-seq):

  1. The current output includes ORFs with very low read counts. I understand that this is helpful for increasing sensitivity but in some cases users may want to apply filtering by CPM etc to extract highly-translated ORFs.
  2. With --framebest option, one stop codon can have multiple TIS, when RiboPvalue has ties (especially when RiboPvalue = 0). One option for filtering is taking the longest ORF but it would be more helpful if users can take P-site read abundance into consideration to pick up the ORF with highest translation activity.

I would appreciate it if you could consider implementing this feature, only if it would not cause too much trouble for you. Many thanks, Sotta

zhpn1024 commented 4 years ago

Thank you. Good suggestion. For details:

  1. The --transprofile option can output transcript level P-site profile. The in-frame counts can be extracted/calculated from the file.
  2. The count you want is the sum of all counts at the in-frame positions in the ORF, right?
  3. I can add a new option like --inframecount, and a column of count values can be added in output ORF table with the option.

Peng

nsotta commented 4 years ago

Thank you for your quick response.

  1. The count you want is the sum of all counts at the in-frame positions in the ORF, right?

Yes, exactly.

  1. I can add a new option like --inframecount, and a column of count values can be added in output ORF table with the option.

This would be perfect for me.

Thank you very much for handling this. I would appreciated it if you could update the Anaconda cloud package with the new version if possible.