novoalab / EpiNano

Detection of RNA modifications from Oxford Nanopore direct RNA sequencing reads (Liu*, Begik* et al., Nature Comm 2019)
GNU General Public License v2.0
108 stars 31 forks source link

How are replicates handled? #81

Closed samir-watson closed 2 years ago

samir-watson commented 3 years ago


I have several replicates of WT and KO samples that I would like to compare with Epinano_DiffErr, but I am unsure at what point I define and input the replicates. I started with mapped bam files for each replicate and separately ran epinano_variants for each file which gave me individual files. However when I input into Epinano_DiffErr multiple csv files for the -k and -w options the script is unable to recognise these as replicates. I have looked in the documentation but could not find information on how I should treat replicates.

Thanks, Samir

Huanle commented 3 years ago

Hi @samir-watson ,

You made a good point. Regarding sample-replicates, we did propose a method based on Epinano_predict predictions in the publication.

I haven't come up with a good strategy with epinano_differr. I think you might be able to either combine/pool all your Ks and Ws samples and run the analyses on the super-K and super-W or further filter/combine your results from independent K-W pairs to reach a consensus. Hope this makes sense.

samir-watson commented 3 years ago

Hi Huanle, Thanks, yes that makes sense, Ill go ahead and combine the results from independent K-W pairs. Cheers, Samir

samir-watson commented 2 years ago

Hi, I have another question regarding replicates, this time using epinano-SVM. In your paper you used the pseudocode

if (s1 ≥ 0.5 and s2 ≥ 0.5 and s3 ≥ 0.5):

M = 1


M = (s1 + s2 + s3)/3

where "M" im assuming is the ProbM column generated by Epinano_Predict.

Then you use pseudocode 2 to determine modification status

Pseudocode 2:

if (Mwt/Mko) > 1.5 and Mwt > 0.5:

status = modified


status = unmodified

For pseudocode 2, I am assuming that this is what Epinano_Predict also does as it gives a mod and unmod column, is this correct? Or is pseudocode 2 specifically used for replicates?

My question then relates to using the averaged ProbM to generate delta features. Looking at, it requires information on the mis1,mis2,mis3,mis4,mis5,ins1,ins2,ins3,ins4,ins5,del1,del2,del3,del4,del5 columns which were not accounted for in the pseudocode and thus not averaged. As I am now confused about how you go about averaging out the replicates could you please explain to me how you went about it?

Huanle commented 2 years ago

Hi, I have another question regarding replicates, this time using epinano-SVM. In your paper you used the pseudocode

if (s1 ≥ 0.5 and s2 ≥ 0.5 and s3 ≥ 0.5):

M = 1


M = (s1 + s2 + s3)/3

where "M" im assuming is the ProbM column generated by Epinano_Predict.

M is the final modification score determined with separate ProbMs (s1, s2, s3) computed with Epinano_Preidct for each sample!

Then you use pseudocode 2 to determine modification status

Pseudocode 2:

if (Mwt/Mko) > 1.5 and Mwt > 0.5:

status = modified


status = unmodified

For pseudocode 2, I am assuming that this is what Epinano_Predict also does as it gives a mod and unmod column, is this correct? Or is pseudocode 2 specifically used for replicates?

Mwt and Mko are the M determined for Wt and Ko samples respectively using method denoted by pseudocode 1.

My question then relates to using the averaged ProbM to generate delta features. Looking at, it requires information on the mis1,mis2,mis3,mis4,mis5,ins1,ins2,ins3,ins4,ins5,del1,del2,del3,del4,del5 columns which were not accounted for in the pseudocode and thus not averaged. As I am now confused about how you go about averaging out the replicates could you please explain to me how you went about it?

You can use delta features and Epinano_Predict to make predictions. Please refer to the examples included in to carry out your analysis.

Please let me know if you have more questions.