gringer / bioinfscripts

Bioinformatics scripts produced over the course of my work. Now maintained on GitLab.
https://gitlab.com/gringer/bioinfscripts
GNU General Public License v3.0
69 stars 15 forks source link

help explanation #4

Closed jordur closed 6 years ago

jordur commented 7 years ago

example.tar.gz Dear David, I have been in touch with your scripts to manipulate raw nanopore sequences (twitter, such a nice tool ;) ). Thus, I have used the following command lines just to play with them:

python porejuicer.py raw ugm07_20170510_FNFAE31754_MN19775_mux_scan_Bjar_1005_85455_ch102_read12_strand.fast5 > example/example.raw

signal_viewer.r example/example.raw

Thus, I obtained the attached files and I was wondering how to explain the drop down of raw signal that you can see in the read. I mean, I am planning to perform this on some reads that we suspect that could be some chimeric nanopore reads in order to trim them or to discard for downstream analysis. Could you help me with this?? Thank you in advance and congratulations for your nice work

gringer commented 7 years ago

That's most likely a pore reversal; the nanopore software will detect pores that have unusual signal and attempt to unblock the pores if possible. My general impression from looking at this raw signal is that it's not behaving like a real DNA sequence because there's not enough regularity in the pattern -- it's either moving too fast, or there's nothing there. The expanded signal graph (signal_out_example.pdf) shows something similar to what I expect, so perhaps I'm just not used to seeing the usual signal from the current software.

jordur commented 7 years ago

Difficult to test that on each *.fast5 file, isn't it? By the way, why did you say that "it's either moving too fast, or there's nothing there"? Could you say me how/where to get more information about that in order to detect those "non-real" reads? From the nanopore community? Thank you in advance.

gringer commented 6 years ago

Sorry, this comment slipped out of my consciousness while I was concentrating on other stuff. Here's a bit of a brain dump off the top of my head, just in case I don't return to this again for another few months:

I can't think of any other resources; most people are not looking at the raw signal from reads. Maybe nanoraw, or it's ONT-assimilated younger brother tombo, could help you out.

I have a poster showing results from some of the earliest runs we made, and found that the error profile of nanopore reads (i.e. what is left behind when you subtract a signal from its running median) looked very normal, typically no more than a few picoamps around the "true" value:

https://f1000research.com/posters/4-865

If the raw signal is very flat (possibly punctuated by occasional peaks), then there's not much change of DNA happening, and probably no DNA.... but that's not the case for your data (well it is, but only at the very start of the signal).

DNA sequences that are processed at 4 kHz and 450 bases per second should have about 8-12 samples per base. In an ideal world it'd be 8.9 samples per base, but DNA secondary structure and enzyme efficiencies get in the way of that. If this "time per base" becomes too variable, then base-callers have a lot more difficulty converting the signal into a proper base call.

gringer commented 6 years ago

Here's a comparison of 10kHz signal and 4kHz signal, showing the initial adapter part of the read (from RBK-002):

Raw signal - 10k_vs_4k

And for comparison, your raw signal, across approximately the same interval at the start of the read. The valleys and plateaus look normal to me

Raw signal - jordur, 4k

Here's the signal around the area of the spike, where I've reduced the magnitude of the spike (but still left a little bit in):

Raw signal - jordur, at flip point

Note that there's no stall point (no time when the pore is flat-lining for a while), and the hump expected at the start of a read doesn't exist. This suggests to me that it's a continuation of a single read which got hit by a flick (sharp negative/positive change in voltage, with the goal of clearing blocked pores), and the flick wasn't strong enough to kill the DNA and/or eject the read out of the pore.

I'm not sure why there's low signal just after the flick, but perhaps it's just the MinION taking a bit of time to recalibrate itself after the voltage change.