Closed peiyihe closed 5 months ago
The event detection algorithm and those parameters were taken from Scrappie. It uses two sets of rolling window t-tests and places event boundaries when either t-statistic reaches a peak above the threshold, so "window_length1" corrasponds to "threshold1". "peak_height" is used by the peak finding algorithm for both tests. I'll include a figure I made to illustrate this below.
For slower sequencing speeds, I just increased the window lengths and thresholds ~proportionally to the increase in expected dwell time. I also measured skip/stay rates on real data and tried to make them approximately equal for different sequencing speeds. You're right that skips are still somewhat frequent. You could try decreasing the thresholds or window lengths to generate more stays and fewer skips, but I haven't developed a systematic way to adjust them.
Thanks! By the way, may I ask where can I learn this event segmentation carefully like the figure you used? Because it seems that there isn't a paper about scrappie and its event detect.
I made that figure myself by inserting print statements into the event detection C++ code, which isn't very reproducible and I didn't explore it much further, so that's why it isn't published. I've also experimented with a python rolling t-test implementation, which I'll include below. It only returns t-scores and peaks from one windowed t-test, which isn't as accurate as the consensus between two windows, but it's easier to understand and make plots. The names of "height" and "threshold" parameters are swapped to match scipy's "find_peaks" function, which isn't exactly the same as Scrappie's implementation but produces similar results
import numpy as np
import bottleneck as bn
from scipy.signal import find_peaks
def roll_ttest(sig, win, height=1.4, threshold=0.2, abs=True):
mean1 = np.roll(bn.move_mean(sig, win),-win+1)
mean2 = np.roll(mean1,-win)
var1 = np.roll(bn.move_var(sig, win),-win+1)
var2 = np.roll(var1, -win)
tscore = np.roll((mean1 - mean2) / np.sqrt((var1+var2)/win),win)
tscore[:win] = np.nan
tscore[-win:] = np.nan
if abs:
tscore = np.abs(tscore)
peaks,prop = find_peaks(tscore,threshold=threshold,height=height,distance=win)
return tscore,peaks
Thanks, now it's very clear!
Hi Sam,
I go through your code in uncalled4, I'm not very clear about your parameter choose. Because I'm trying to balance the skip errors and stay errors. It seems in current parameter, there are much more skip errors in the event. And I still cannot understand too much about the first five parameters in this EventDetector. Thanks.