skovaka / UNCALLED

Raw nanopore signal mapper that enables real-time targeted sequencing
MIT License
520 stars 44 forks source link

Remove mux scan windows from Flongle run #57

Open maximilianmordig opened 1 year ago

maximilianmordig commented 1 year ago

Hi Sam

I am looking at your simulator. Your simulation script assumes that there are exactly 4 muxes that are traversed in order to say that a window is a mux scan. I am wondering how to remove the mux scans for Flongle cells given a sequencing summary file only. There, the mux is always equal to 1. To my knowledge, the sequencing summary file also contains reads during a mux scan. Is it possible to detect them from the end reason "unblock_mux_change" which occurs 9149 times? There is also the end reason "mux_change", but it occurs only 13 times (24h Flongle run with 64 active pores), whereas the sequencing summary file has 620k entries. I wonder what the "mux_change" end reason means. Alternatively, is it possible to extract the mux scans from the bulk fast5?

Are the mux scans nowadays performed every 90 minutes or more often automatically? Are mux scans (as detected with find_scans in your code) the same as pore flushes? It seems that for ultra-long read sequencing, they are not performed in parallel across channels.

Moreover, is it possible to compute the ejection delay if the selective sequencing was not run with UNCALLED? Is this information contained in the bulk fast5?

Also, you are saying that UNCALLED only supports r9.4.1, is there any fundamental difference with newer pores regarding the simulator?

I was wondering about the eject delay here: https://github.com/skovaka/UNCALLED/blob/4c0584ab60a811e74664b0b0d241257d39b967ae/uncalled/sim_utils.py#LL204C13-L206C70 Are the 'ej' or 'ub' tags also available from a real sequencing run (since it usually doesn't produce a paf file without alignment)? Is ej the extra delay because the read speed is not constant? More precisely, I don't get the sense of tlns[i] - ((p.qr_len/450.0)+ej.

skovaka commented 1 year ago

Sorry for the delayed response. The mux scans were included in the simulator because they were important for keeping the timing between reads consistent for each pore. It also served as a useful "refresh" period where timings are updated to simulate sparser sequencing later in the run. Flongles also likely sequence more sparsely later in the run, but without the mux scans I would have to find another way to simulate that. It's certainly not impossible, but it would require rewriting quite a bit, and I'm not very familiar with Flongle characteristics.

The simulator is only compatible with UNCALLED. MinKNOW didn't output any adaptive sampling metadata when UNCALLED was developed, so all the required tags are specialized. UNCALLED only supports r9.4.1, which has a much smaller k-mer space than the new r10 pores and is therefore a more tractable problem. As you may be aware, r9.4.1 is now "legacy" sequencing chemistry and will eventually be removed from the ONT store, so I'm not planning to add any new features to UNCALLED or the simulator.

You might interested in this new simulator, which is designed to be compatible with any adaptive sampling method: https://www.biorxiv.org/content/10.1101/2023.05.16.540986v1

Thanks, Sam