hasindu2008 / slow5tools

Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.
https://hasindu2008.github.io/slow5tools
MIT License
90 stars 6 forks source link

Are pA values from slow5tools normalized? #79

Closed chilampoon closed 2 years ago

chilampoon commented 2 years ago

Hi there, I am looking at the normalization methods of ont raw signals. I am not clear about whether the picoamp values gotten from slow5tools seq_reads(pA=True) are normalized already, or not? I've also tried the normalization in tombo using their function tombo_stats.normalize_raw_signal, where they scale the values of raw signals to ~0.

Seems like using either pA values or those normalized raw signal values didn't affect too much for my downstream analysis, but I am curious if I'd like to normalize the squiggles of my dataset globally, which method you'll suggest? Thanks.

Psy-Fer commented 2 years ago

Hello,

pA conversion is not the same as normalisation.

pA conversion is handled by the following

  1. Scale = range / digitisation
  2. pA = scale * (raw_signal + offset)

This gives you a positive float value, like that you get from any of the pA conversions in any of the slow5 associated tools.

Normalisation, is usually of 2 kinds. In the early days, it was z-normalisation, but now most tools use median-median absolute difference (med-mad).

You can see the supp plots over on the SquiggleKit paper to see the impact on how it impacts down stream analyses.

So if you are doing global normalisation. First you want to do pA conversion, as this gets all the raw dac values into the same range, then when you normalise, I'd use med-mad.

I hope this answers your question. Let me know if I missed something or you have any other questions

James

chilampoon commented 2 years ago

I see, I'll do med-mad on pA values then, thank you so much James!