nanoporetech / ont_fast5_api

Oxford Nanopore Technologies fast5 API software
Other
144 stars 28 forks source link

single_to_multi_fast5 do not collect all the single files if the input folder contains mixed types of fast5 files. #81

Open Marjan-Hosseini opened 9 months ago

Marjan-Hosseini commented 9 months ago

I have a dataset that contains thousands of mixed multiple and single fast5 files in a non-homogenous folder structure. I want to convert all the fast5 files to multi fast5 files.

My solution is to first convert all multi fast5 files to single. The command _multi_to_singlefast5 converts only the multi fast5 files to single in a new folder:

orig_path=mixed
save_path=multi
single_path=single
multi_to_single_fast5 -i $orig_path/ -s $single_path/ --recursive

The above command collects all the reads that exist in any multi fast5 files as single fast5 files in $single_path. Then I can convert them all back to multi and make sure I am not missing any read:

single_to_multi_fast5 -i $single_path/ -s $save_path/ --filename_base $output_name --batch_size 1000 --recursive

The above command works fine too. Now I want to use _single_to_multifast5 command on a folder that contains both multi and single fast5 files ($orig_path) and I expect that it collects all the reads in the single files that exist in $orig_path into muti-files.

single_to_multi_fast5 -i $orig_path/ -s $save_path/ --filename_base $output_name --batch_size 1000 --recursive

But I don't get all the reads from the single fast5 files and some reads are missing in the output folder. This command works fine on the folder that contains only single fast5 files.

Nothing is overwritten and I am testing these steps on a few files in a different folder. Is there a solution to this problem except that I have to check every file to be multi or single? My dataset is super huge, I cannot check if individual files are single and multi. It would take ages.