novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
109 stars 31 forks source link

How to consider a site as modified ? #124

Closed rania-o closed 1 year ago

rania-o commented 2 years ago

Hello,

I have some questions about the outputs or Epinano. I used the sum-error parameter and I would like to know if in order to consider a site "modified" it has to show up in the results of the delta-sum-error method and the linear model, or just modified for one of the two is enough? Also, I have a question about the results of the linear model, I don't understand what the column "lm_Bonferroni_outlier_test" corresponds to and what is the basis for saying that it is modified or not? Still the results of the linear model, in order to consider a position "modified", do both columns "lm_Bonferroni_outlier_test" and "lm_residuals_z_scores_prediction" have to give "mod" or is one of them enough?

Thanks for your help. Rania

enovoa commented 1 year ago

Dear @rania-o sorry for the slow reply. To consider a site as modified there are different criteria offered by EpiNano, one option is using the delta-sum-error and another one is the linear model. The outputs will not be identical (but should have common sites) as they rely on different assumptions. Hope this clarifies your doubts!

rania-o commented 1 year ago

Hi @enovoa,

Thank you for your reply. I still don't understand which method I have to rely on as I've got only few positions in common between the two methods.

Also, I still have the same question about the results of the linear model, in order to consider a position "modified", do both columns "lm_Bonferroni_outlier_test" and "lm_residuals_z_scores_prediction" have to give "mod" or is one of them enough?

Thank you, Rania

enovoa commented 1 year ago

Hi @rania-o, I cannot say which of them you should use, as it depends on the specific data type, modification, etc that you are using. Detection of RNA modifications varies depending on the stoichiometry of the sites, the sequence context, the coverage, the modification type, to name a few variables. Ideally you should have sequenced some control (e.g an RNA with and without a given RNA modification) in your own dataset to be able to judge the performance of each method on your own dataset. If you haven't done so, you may wish to try out the demo data to see how it performs with some data for which the RNA mod is known. You can also download public data for which you know the ground truth (eg. you can use total RNA sequencing in WT and snoRNA KOs from Begik, Lucas et al. Nat Biotech 2021, https://www.ebi.ac.uk/ena/browser/view/PRJEB37798?show=reads). Hope that helped! For the second question, @Huanle might be able to clarify better than me. Thanks, Eva

rania-o commented 1 year ago

Hi @enovoa,

Thanks a lot for these specifications. I already used demo data and other public data to test tools, and yes we do have a control sample (IVT), but it's always hard to choose a method when it's a denovo detection. for the other question, I'll wait for @Huanle's answer.

Thanks again, Rania

enovoa commented 1 year ago

Hi @rania-o the control sample that you mention above is to be able to run your samples in pairwise manner, that is not what i am referring to. I mean an internal control, e.g. a modified and unmodified oligo for which you know the ground truth. For this, if you don't have an INTERNAL control inside your own run, you may wish to test the demo data and/or publicly available datasets with ground truth known or orthogonal data available.

rania-o commented 1 year ago

Aaah, yes I see what you mean. Indeed, we already have this type of control (oligo with two known modified positions), but unfortunately with all the tools I have tested, there is none that gives me the two positions only. Always false positives and sometimes I don't even find my two known positions. That's why with denovo detection, I was wondering if in your data you noticed that one method is more reliable than the other for viral transcripts.

enovoa commented 1 year ago

Sorry I cannot provide advice on what option(s) are best analyze your own data - as I said the performance varies depending on sequence context, modification stype, stoichiometry, etc. I would recommend testing how each method/algorithm performs best on your internal control (which you seem to have) as well as on public data that is similar to your current data, and guide your decisions based on those results.

rania-o commented 1 year ago

Yes, I understand your position, it is difficult to have constant parameters or methods for all analyses, since these will depend rather on the data types. Thank you for your time and clarification. Rania

DelphIONe commented 1 year ago

Thanks for your tool! I have the same question as Rania "Also, I still have the same question about the results of the linear model, in order to consider a position "modified", do both columns "lm_Bonferroni_outlier_test" and "lm_residuals_z_scores_prediction" have to give "mod" or is one of them enough?" What does it mean that a position I have this type of result : seq1 55 T +,0.36707,0.77701,unm,0.396752762342349,3.07227008384971,mod Should I consider this modified position or not? Can the other file (*delta-sum_err.prediction.csv) help me make a decision?

Thanks for your reply!

enovoa commented 1 year ago

There are multiple criteria offered by EpiNano to consider a site as "positive" or not. The decision will depend on whether you prioritize sensitivity or specificity. It will also vary depending on the modification type. We offer more than one metric to assess which are differential modified sites, it is the choice of the user to decide which one they will stick with. You should have some ground truth in your own data to guide such decisions. You can also use the demo data (but it will be for a specific modification). Hope that helped!