Closed MustafaElshani closed 1 year ago
Hello @MustafaElshani ,
fast5_subset
does already have the option to take a list of files from which to subset, along with a list of ids (though I admit this documentation is not up to date in the README), the argument --file_list
can be provided along with --read_id_list
:
fast5_subset -h
usage: Tool for extracting reads from a multi_read_fast5_file by read_id [-h] -i INPUT -s SAVE_PATH -l READ_ID_LIST [-f FILENAME_BASE] [-n BATCH_SIZE] [-t THREADS] [-r] [--ignore_symlinks] [-c {vbz,vbz_legacy_v0,gzip,None}]
[--file_list FILE_LIST]
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Path to Fast5 file or directory of Fast5 files
-s SAVE_PATH, --save_path SAVE_PATH
Folder to output MultiRead subset to
-l READ_ID_LIST, --read_id_list READ_ID_LIST
File containing list of read ids to extract (or sequencing_summary.txt file)
-f FILENAME_BASE, --filename_base FILENAME_BASE
Root of output filename, default='batch' -> 'batch0.fast5'
-n BATCH_SIZE, --batch_size BATCH_SIZE
Number of reads per multi-read file (default 4000
-t THREADS, --threads THREADS
Maximum number of threads to use
-r, --recursive Search recursively through folders for MultiRead fast5 files
--ignore_symlinks Ignore symlinks when searching recursively for fast5 files
-c {vbz,vbz_legacy_v0,gzip,None}, --compression {vbz,vbz_legacy_v0,gzip,None}
Target output compression type
--file_list FILE_LIST
File containing names of files to search in
Would it be possible if 'fast5_subset' were able to take a list of .fast5 files to extract the read_id.
I have a large dataset which I have analysed I know the read_id I know which .fast5 file it resides in I just want to extract and make new .fast5 files as 'fast5_subset' already does. This will save time I think
Mustafa