Closed mrvollger closed 5 years ago
Isn't that what fast5_subset does? https://github.com/nanoporetech/ont_fast5_api/blob/master/README.rst#fast5_subset
Not exactly, I have a list of fast5s (not a list of read ids) that I want to combine into a single fast5, and some of those fast5s may be single_fast5s and others may be multi_fast5s. Unless I am misunderstanding what subset can do.
fast5_subset
almost does what you're after -- it takes a list of read_ids and a folder of fast5 files (either single or multi, or a mix), and then extracts the appropriate reads from them into a new set of multi-read fast5 files (with any specific number of reads per file, so you could just put a giant number here if you want to concatenate everything together). So if you're happy to start by compiling a list of all the read_ids in the files you've got then you could jury-rig fast5_subset
to do this:
fast5_subset -i <input_folder> -s <output_folder> --batch_size 9999999 [etc]
If you do want to write your own tool, then I'd suggest looking at the bits of mulit_fast5_subset
that copy reads from files -- you've already done most of the work getting the read_ids:
We've got an update to the project that should come out today (v1.4.7) that splits out a few of these steps and is a bit easier to read.
Great, I will check it out once it has been updated. Thanks!
Also when I try to click that link I get a 404 error. Is there another file you meant to reference, or will this file be part of the new release?
Ok, 1.4.7 is up (that ruined the above link, sorry! I have swapped to correctly using permalinks) -- here is the bit of fast5_subset
that copies reads into multi-read files:
And the bit you're after is probably going to be in read_generator()
(which takes those read_id
s and turns them into Fast5Read
objects):
Give it a try and let me know how it goes.
I'm trying to use either the fast5_subset or single_to_multi_fast5 and I keep getting this error anytime I try and run one of the console scripts:
ValueError: mpi4py.MPI.Status size changed, may indicate binary incompatibility. Expected 48 from C header, got 40 from PyObject
these are the versions of my dependancies...do i need to be running a specific version of openmpi?
ont-fast5-api (1.4.7) progressbar33 (2.4) six (1.12.0) numpy (1.17.0) h5py (2.9.0)
@fbrennen thanks this looks like it should be enough to get me started. If I get a working code snippet I will post it. For now I think I can close this.
I am looking to use your API to take an FOFN (file of file names) of fast5 files and then turn them into a single multifast5.
From you examples I can figure out how to iterate over the reads, but I am unsure how to write them to a multifast5. I tried reading the source code but I have not been able to figure out how reads are written to a fast5.
Any help would be appreciated. Below is some of the code I have been playing with: