Closed AAnnan closed 2 years ago
Hi @AAnnan -- in general we don't target single-read files, as they've been deprecated for a long time now, though it's certainly something we could look in to. From what you describe above it's not clear to me why you'd need single-read files at all though -- why not go from multi-read to demultiplexed multi-read?
demux_fast5
to split them.Hi @AAnnan -- in general we don't target single-read files, as they've been deprecated for a long time now, though it's certainly something we could look in to. From what you describe above it's not clear to me why you'd need single-read files at all though -- why not go from multi-read to demultiplexed multi-read?
I don't need single-read files, it's simply how I sometimes get the data to analyse.
1. Basecall and barcode with guppy (no need to turn on fast5 output). 2. Take the summary file from that and your original multi-read files, and use `demux_fast5` to split them.
Yeah, that's exactly what I do now (disabling the fast5 output speeds it up significantly). However, sometimes my original raw fast5 files are single.
(sorry, hit the wrong button!)
Great, ok -- I'm glad the multi-read case is working well for you. Out of curiosity, where do you get those single-read files?
A 2019 dual-enzyme methylation experiment. I'm reprocessing some of this data now. It's not super important, I can always add a check for single or multi files and run single_to_multi_fast5
in the case of singles.
Hi,
I'm a Megalodon user primarily, and since Megalodon doesn't have barcoding capabilities yet, demultiplexing needs to be performed separately first.
Originally, for single read fast5s files I launched a Guppy basecall run with barcoding and fast5_out enabled + a little bash script to make lists of read_ids by barcode from
sequencing_summary.txt
and retrieve the single read fast5s from theworkspace
folder, then I reconverted them to multi, for storage and because it performs better with Megalodon. For multi read fast5 files I did the same, only using 'multi_to_single_fast5' prior to the first step.'demux_fast5' looks well suited for me. It combines my bash script retrieving read_ids and barcode info and 'single_to_multi_fast5', while saving space by extracting the barcoded fast5s from the raw data instead of the Guppy basecalled on (leaving out unnecessary sequence and basecalling information). However it performs extremely poorly on single read fast5s files. Is there a way to make it perform better on single fast5s? That would save me a run of 'single_to_multi_fast5'.
PS: Something even simpler would be to have Guppy_barcoder be able to take in raw fast5s and barcode them. Or Guppy to not only demultiplex the fastq output but also the fast5 one when the fast5_out flag is enabled...