How to understand eventalign samples output

jorgcalis commented 4 years ago

Dear developers/@jts,

Thanks for your great work making/maintaining nanopolish. It is a pleasure using nanopolish. I am looking at direct RNA-seq data with eventalign, and want to work with the per sample raw data. Eventually, to compare the measurements at the same reference position from different reads. I looked at the high-level explanation blog, but there are still some parts of the output that I do not fully understand:

Here bits of a single line of output. reference_kmer: CAGGT event_index: 588 event_level_mean: 104.48 event_stdv: 4.682 event_length: 0.00299 model_mean: 98.91 model_stdv: 9.52 standardized_level: 0.53 samples: 116.896,118.601,120.151,131.465,124.336,117.361,118.136,119.376,129.605

My specific questions:

What explains the difference between the event_level_mean and the mean of all samples (104.48 vs 121.77)? Do I need to format the per sample (raw?) data to the event_level_mean for a better comparison of per sample measurements between reads?
What does the standardized_level represent? I read that it is (event_level_mean-model_mean)/model_stdv, but that gives me 0.585, not 0.53. What other factors should I take into account here?

Feel free to point me to some area where this has been explained before. Thanks for your help and keep up the good work!

Best, Jorg

jts commented 4 years ago

Thanks, I'm glad to hear you like the program.

Both of your questions are due to how nanopolish scales the reads to account for per-read variations in signal. Internally nanopolish scales the pore model (model_mean, model_stdv) instead of scaling the events (the reason why we do this is a bit lengthy but I'm happy to explain if it helps your work). The samples output however is scaled, which is why the mean of the samples doesn't match event_mean. If you pass the --scale-events flag both samples and event_mean will be scaled and columns will be consistent.

Similarly the standardized_level column has an additional scaling parameter (var) applied:

https://github.com/jts/nanopolish/blob/master/src/alignment/nanopolish_eventalign.cpp#L460

The best reference for how the scaling works is in the supplement of our methylation calling paper:

https://static-content.springer.com/esm/art%3A10.1038%2Fnmeth.4184/MediaObjects/41592_2017_BFnmeth4184_MOESM258_ESM.pdf

I hope this helps, let me know if you have any followup questions or if this in unclear.

Jared

jorgcalis commented 4 years ago

Dear Jared,

Thanks for a quick response.

If I understand it correctly, the scaling will work to make the samples of different reads best comparable. The unscaled values in the output are event_index, event_level_mean and event_stdv, unless you pass --scale-events, than all output will be scaled. I'll go through the reference and let you know if I have any followup questions.

Thanks again! Best, Jorg

iremtll commented 2 years ago

Dear developers,

why are there several values (samples) for one read? Is this how nanopore works?

Thanks a lot! Best

jts / nanopolish

How to understand eventalign samples output #717