nanoporetech / ont_fast5_api

Oxford Nanopore Technologies fast5 API software
Other
144 stars 28 forks source link

fast5_subset to take list of .fast5 files #75

Closed MustafaElshani closed 1 year ago

MustafaElshani commented 1 year ago

Would it be possible if 'fast5_subset' were able to take a list of .fast5 files to extract the read_id.

I have a large dataset which I have analysed I know the read_id I know which .fast5 file it resides in I just want to extract and make new .fast5 files as 'fast5_subset' already does. This will save time I think

Mustafa

hb-nanopore commented 1 year ago

Hello @MustafaElshani ,

fast5_subset does already have the option to take a list of files from which to subset, along with a list of ids (though I admit this documentation is not up to date in the README), the argument --file_list can be provided along with --read_id_list :

fast5_subset -h
usage: Tool for extracting reads from a multi_read_fast5_file by read_id [-h] -i INPUT -s SAVE_PATH -l READ_ID_LIST [-f FILENAME_BASE] [-n BATCH_SIZE] [-t THREADS] [-r] [--ignore_symlinks] [-c {vbz,vbz_legacy_v0,gzip,None}]
                                                                         [--file_list FILE_LIST]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to Fast5 file or directory of Fast5 files
  -s SAVE_PATH, --save_path SAVE_PATH
                        Folder to output MultiRead subset to
  -l READ_ID_LIST, --read_id_list READ_ID_LIST
                        File containing list of read ids to extract (or sequencing_summary.txt file)
  -f FILENAME_BASE, --filename_base FILENAME_BASE
                        Root of output filename, default='batch' -> 'batch0.fast5'
  -n BATCH_SIZE, --batch_size BATCH_SIZE
                        Number of reads per multi-read file (default 4000
  -t THREADS, --threads THREADS
                        Maximum number of threads to use
  -r, --recursive       Search recursively through folders for MultiRead fast5 files
  --ignore_symlinks     Ignore symlinks when searching recursively for fast5 files
  -c {vbz,vbz_legacy_v0,gzip,None}, --compression {vbz,vbz_legacy_v0,gzip,None}
                        Target output compression type
  --file_list FILE_LIST
                        File containing names of files to search in