How are replicates handled?

samir-watson commented 3 years ago

Hi,

I have several replicates of WT and KO samples that I would like to compare with Epinano_DiffErr, but I am unsure at what point I define and input the replicates. I started with mapped bam files for each replicate and separately ran epinano_variants for each file which gave me individual .aligned.plus_strand.per.site.csv files. However when I input into Epinano_DiffErr multiple csv files for the -k and -w options the script is unable to recognise these as replicates. I have looked in the documentation but could not find information on how I should treat replicates.

Thanks, Samir

Huanle commented 3 years ago

Hi @samir-watson ,

You made a good point. Regarding sample-replicates, we did propose a method based on Epinano_predict predictions in the publication.

I haven't come up with a good strategy with epinano_differr. I think you might be able to either combine/pool all your Ks and Ws samples and run the analyses on the super-K and super-W or further filter/combine your results from independent K-W pairs to reach a consensus. Hope this makes sense.

samir-watson commented 3 years ago

Hi Huanle, Thanks, yes that makes sense, Ill go ahead and combine the results from independent K-W pairs. Cheers, Samir

samir-watson commented 2 years ago

Hi, I have another question regarding replicates, this time using epinano-SVM. In your paper you used the pseudocode

if (s1 ≥ 0.5 and s2 ≥ 0.5 and s3 ≥ 0.5):

M = 1

else:

M = (s1 + s2 + s3)/3

where "M" im assuming is the ProbM column generated by Epinano_Predict.

Then you use pseudocode 2 to determine modification status

Pseudocode 2:

if (Mwt/Mko) > 1.5 and Mwt > 0.5:

status = modified

else:

status = unmodified

For pseudocode 2, I am assuming that this is what Epinano_Predict also does as it gives a mod and unmod column, is this correct? Or is pseudocode 2 specifically used for replicates?

My question then relates to using the averaged ProbM to generate delta features. Looking at Epinano_make_delta.py, it requires information on the mis1,mis2,mis3,mis4,mis5,ins1,ins2,ins3,ins4,ins5,del1,del2,del3,del4,del5 columns which were not accounted for in the pseudocode and thus not averaged. As I am now confused about how you go about averaging out the replicates could you please explain to me how you went about it?

Huanle commented 2 years ago

Hi, I have another question regarding replicates, this time using epinano-SVM. In your paper you used the pseudocode
if (s1 ≥ 0.5 and s2 ≥ 0.5 and s3 ≥ 0.5):

M = 1

else:

M = (s1 + s2 + s3)/3
where "M" im assuming is the ProbM column generated by Epinano_Predict.

M is the final modification score determined with separate ProbMs (s1, s2, s3) computed with Epinano_Preidct for each sample!

Then you use pseudocode 2 to determine modification status
Pseudocode 2:

if (Mwt/Mko) > 1.5 and Mwt > 0.5:

status = modified

else:

status = unmodified
For pseudocode 2, I am assuming that this is what Epinano_Predict also does as it gives a mod and unmod column, is this correct? Or is pseudocode 2 specifically used for replicates?

Mwt and Mko are the M determined for Wt and Ko samples respectively using method denoted by pseudocode 1.

My question then relates to using the averaged ProbM to generate delta features. Looking at Epinano_make_delta.py, it requires information on the mis1,mis2,mis3,mis4,mis5,ins1,ins2,ins3,ins4,ins5,del1,del2,del3,del4,del5 columns which were not accounted for in the pseudocode and thus not averaged. As I am now confused about how you go about averaging out the replicates could you please explain to me how you went about it?

You can use delta features and Epinano_Predict to make predictions. Please refer to the examples included in https://github.com/novoalab/EpiNano/blob/e1a538eb0cdb36626a9060ea43bc8292d8491a3b/test_data/make_predictions/run.sh#L49 to carry out your analysis.

Please let me know if you have more questions.

novoalab / EpiNano

How are replicates handled? #81