I have a dataset that contains thousands of mixed multiple and single fast5 files in a non-homogenous folder structure.
I want to convert all the fast5 files to multi fast5 files.
My solution is to first convert all multi fast5 files to single. The command _multi_to_singlefast5 converts only the multi fast5 files to single in a new folder:
The above command collects all the reads that exist in any multi fast5 files as single fast5 files in $single_path.
Then I can convert them all back to multi and make sure I am not missing any read:
The above command works fine too.
Now I want to use _single_to_multifast5 command on a folder that contains both multi and single fast5 files ($orig_path) and I expect that it collects all the reads in the single files that exist in $orig_path into muti-files.
But I don't get all the reads from the single fast5 files and some reads are missing in the output folder. This command works fine on the folder that contains only single fast5 files.
Nothing is overwritten and I am testing these steps on a few files in a different folder.
Is there a solution to this problem except that I have to check every file to be multi or single? My dataset is super huge, I cannot check if individual files are single and multi. It would take ages.
I have a dataset that contains thousands of mixed multiple and single fast5 files in a non-homogenous folder structure. I want to convert all the fast5 files to multi fast5 files.
My solution is to first convert all multi fast5 files to single. The command _multi_to_singlefast5 converts only the multi fast5 files to single in a new folder:
The above command collects all the reads that exist in any multi fast5 files as single fast5 files in $single_path. Then I can convert them all back to multi and make sure I am not missing any read:
The above command works fine too. Now I want to use _single_to_multifast5 command on a folder that contains both multi and single fast5 files ($orig_path) and I expect that it collects all the reads in the single files that exist in $orig_path into muti-files.
But I don't get all the reads from the single fast5 files and some reads are missing in the output folder. This command works fine on the folder that contains only single fast5 files.
Nothing is overwritten and I am testing these steps on a few files in a different folder. Is there a solution to this problem except that I have to check every file to be multi or single? My dataset is super huge, I cannot check if individual files are single and multi. It would take ages.