nanoporetech / ont_fast5_api

Oxford Nanopore Technologies fast5 API software
Other
144 stars 28 forks source link

Some problem of multi_to_single_fast5 #74

Closed Bin-Ma closed 1 year ago

Bin-Ma commented 1 year ago

Hi @fbrennen, I have some problem about "multi_to_single_fast5". Recently, I want to splite my fast5 file into the single for tombo analysis. I tried to combine the fast5 file into a single fast5 file (about 11G) and split it to a list of single fast5 files. However, the result of "multi_to_single_fast5" is only 268Mb and it didn't work in tombo. I don't know how to solve this problem. Looking forward for your reply. multi_to_single_fast5 -i barcode01.fast5 -s ./barcode01 -t 1

Bin

hb-nanopore commented 1 year ago

Heya Bin,

I am not sure why you are experiencing issues. Could you please give me some more information?

  1. What was the output on the command line when you ran the command you have given, above? (Could you paste it here?)
  2. What version of ont_fast_api are you on? (either with pip freeze or in python import ont_fast5_api then ont_fast5_api.__version__)
  3. How many reads are in the multiread fast5 file (barcode01.fast5)?
  4. What is in the folder created? (Inside barcode01 - there should be a filename_mapping.txt plus directories containing lots of single read fast5s?)

I look forward to your reply, thanks

Hayleigh

Bin-Ma commented 1 year ago

Hi @hb-nanopore, Thanks for your reply.

  1. The log file is available below.
  2. The version of ont_fast_api is 4.0.2.
  3. There are 15573 reads in the multiread fast5 file, counted through the fastq file after base calling.
  4. The folder contains a folder with 4000 fast5 files and the filename_mapping.txt. The filename_mapping.txt and one fast5 file of the output were uploaded. barcode01.log filename_mapping.txt 0a2a7ca7-b51d-4dff-966a-8034034ce974.fast5.zip

Thanks again for your help. Looking forward for your reply.

Bin

hb-nanopore commented 1 year ago

Thank you.

It seems like there is the standard 4000 reads in the multiread fast5. You can check by counting the read ids inside it by e.g.

from ont_fast5_api.multi_fast5 import MultiFast5File
f = "/path/to/barcode01.fast5"
reads = MultiFast5File(f)
len(reads.get_read_ids())

However, if you have a directory with lots of different multiread fast5s (i.e. the original directory that you basecalled to get 15573 reads) you can give the whole directory (rather than a path to a specific fast5 file) to multi_to_single_fast5 and it will convert all of them to single read fast5s, would that help?

Hayleigh

Bin-Ma commented 1 year ago

Thanks for your reply. I have tried it in my datasets, but it didn't work well. Meanwhile, I found that it achieved the splitting last year in the same dataset with the same command. I have no idea about it. Are there any other methods to split the fast5 file? Thank you for your patience. Looking forward to your reply.

Bin

hb-nanopore commented 1 year ago

From everything you have sent me, as far as I can tell your file barcode01.fast5 has 4000 reads in it and so multi_to_single_fast5 is actually working correctly. If you think this is not the case please look inside your barcode01.fast5 to confirm how many read ids are in there as I described above.

This is the only tool for splitting multiread fast5s.

It also works on directories of fast5s, which I would recommend in your case. You say you are having problems with using it in this way, could you describe the issues?

Hayleigh

Bin-Ma commented 1 year ago

Yes, you are right. The multi_to_single_fast5 is working correctly. I solved the problem of tombo, which occured cased by the version of h5py. Thank you very much for your reply. Sincerely yours, Bin