smithlabcode / ribotricer

A tool for accurately detecting actively translating ORFs from Ribo-seq data
http://doi.org/djv4
GNU General Public License v3.0
28 stars 8 forks source link

Unable to reproduce results from published studies #69

Closed HiteshKore closed 2 years ago

HiteshKore commented 2 years ago

Hi @saketkc,

Thank you for your prompt reply to my previous issue and for clarifying the doubts. I am using Ribotricer for my analysis and it is performing well. I want to assess its accuracy to identify the ORFs detected by published studies. For this, I downloaded the control access data from Heesch et. al., Cell, 2019 (https://pubmed.ncbi.nlm.nih.gov/31155234/) wherein authors have identified 209 novel ORFs based on ribosome profiling data generated for 80 human heart tissues using RiboTaper approach. I reformatted these ORFs according to the Ribotricer database format (File attached) and carried out the Ribo-seq analysis. Unfortunately, I was unable to detect any of these candidates ORFs.

Here is the command for your reference- ribotricer detect-orfs --bam hs_lv_001rawsortedByCoord.out.bam --ribotricer_index RiboTricer_customDB.txt --prefix --hs_lv_001 phase_score_cutoff 0.44

Could you please provide a possible explanation for this?

I appreciate your kind help.

Thank you

Best regards, Hitesh

RiboTricer_customDB_07072021_posControl.txt

saketkc commented 2 years ago

Hi @HiteshKore, this could happen for multiple reasons. Can you share the ribotricer output file? Also, how was the cutoff chosen in this case?

HiteshKore commented 2 years ago

Thank you for your kind response.

I used ORFs from Heesch et. al., Cell, 2019 study and ran the ribotricer using phase score cut off of 0.44 as recommended for the human dataset (Please refer to the command in the previous comment). It generated all the intermediate files except translating_ORFs.tsv files.

Below is the ribotricer log- Jul 16 11:50:38 ..... started ribotricer detect-orfs Jul 16 11:50:38 ... started parsing ribotricer index file Jul 16 11:50:38 ... started inferring experimental design [E::idx_find_and_load] Could not retrieve index file for '/working/lab_harsgag/hiteshK/Riboseq_analysis/Heart_translatome/Trimmed/Output/hs_lv_001_D_Ri_raw_trmAligned.sortedByCoord.out.bam' Jul 16 11:55:16 ... started reading bam file [E::idx_find_and_load] Could not retrieve index file for '/working/lab_harsgag/hiteshK/Riboseq_analysis/Heart_translatome/Trimmed/Output/hs_lv_001_D_Ri_raw_trmAligned.sortedByCoord.out.bam' 0%| | 0/59195198 [00:00<?, ?reads/s][E::idx_find_and_load] Could not retrieve index file for '/working/lab_harsgag/hiteshK/Riboseq_analysis/Heart_translatome/Trimmed/Output/hs_lv_001_D_Ri_raw_trmAligned.sortedByCoord.out.bam' Jul 16 12:02:10 ... started plotting read length distribution Jul 16 12:02:11 ... started calculating metagene profiles. This may take a long time...

Jul 16 12:02:11 ... started plotting metagene profiles Jul 16 12:02:11 ... started inferring P-site offsets WARNING: no periodic read length found... using cutoff 0.44

saketkc commented 2 years ago

It appears that the default cutoff of 0.44 might be too high in this case. You can set the cutoff to something lower, say 0.33 and look at the distribution of the scores by switching on the --report_all flag. It should output an output fie in this case.

HiteshKore commented 2 years ago

Hi Sanket, Even after reducing the cutoff, I could not get any output.

Query regarding the ribosome footprints- I am using different Ribo-seq datasets for my analysis and ribosome footprint size varies with datasets (ranging from 28-40bp). Does longer footprints affect the prediction of candidate ORFs in RiboTricer?
I did not specify --read_lengths option and ask the tool to determine the best footprints. Could you please provide your suggestions on this? Thank you Kind regards, Hitesh

saketkc commented 2 years ago

Ribotricer detects the best P-site offsets for each read lengths so a range of read lengths is not a problem. However a 40bp footprint would more likely be a technical artifact of incomplete trimming. You can plot the distribution of fragment lengths to see which fragment lengths are more abundant, generally these are expected to be in 28-32bp range.

HiteshKore commented 2 years ago

Thanks a lot for your help. Any explanation on why riboTricer is unable to predict the ORFs identified by ribotapper?

saketkc commented 2 years ago

It's hard to comment without taking a closer look at the data and ribotricer's output. If you can share your data and scripts to reproduce the analyses, I can take a look. You can email me at schoudhary@nygenome.org

saketkc commented 2 years ago

Closing as I haven't heard back.