rrwick / Deepbinner

a signal-level demultiplexer for Oxford Nanopore reads
GNU General Public License v3.0
124 stars 23 forks source link

how did you determine these parameters? #1

Closed Huanle closed 6 years ago

Huanle commented 6 years ago

Hi Ryan, Just wonder how did you determine the values of these parameters: initial_trim_size = 10 trim_increment = 25 stdev_threshold = 20 look_forward_windows = 5 window_count_threshold = 4 in trim_signal.py?

Based on experience? If i am going to process some direct RNA sequencing data, do i need change these values?

Thanks. Huanle

rrwick commented 6 years ago

Just trial and error - nothing too fancy. That whole function ideally wouldn't have to exist, but some reads start with too much open pore signal, so if I don't trim it off I can miss the barcode signal. Even so, it's a messy process, and I know that it doesn't get it right each time, so training sets can have some bogus data. I'm sure that function could be improved upon. That's why when classifying a read, Deepbinner scans multiple signal windows to look further into the read.

As a side note, are you doing barcoding with direct RNA sequencing? I see on the kit page on ONT's site it says 'Barcoding kits in development'. Are they available?

Ryan

Huanle commented 6 years ago

thanks Ryan. Yes. I am doing barcoding with direct RNA sequencing with the kit you pointed to.

Huanle

rrwick commented 6 years ago

Okay, interesting. Something to consider: when we do barcoding with the 1D ligation kits (whole genome DNA), I see a small fraction of reads (maybe about 1% or so) that seem to have the wrong barcode. My hypothesis is that there are unligated barcode sequences left over, and then when the samples are pooled and the adapter is ligated on, some of these barcodes get ligated onto the wrong sample's DNA.

This 'barcode switching' at 1% is probably not a problem for WGS, but could maybe be an issue for transcriptomes. Remember the kerfuffle caused by Illumina barcode issues? Again, I don't know if this will happen in your data - just be wary of it as a possibility.

Ryan

Huanle commented 6 years ago

@rrwick thanks heaps for letting me know this. i did not know it.

Huanle