nanoporetech / ont_fast5_api

Oxford Nanopore Technologies fast5 API software
Other
144 stars 28 forks source link

multi_to_single_fast5 generated only 4000 single fast5 #55

Closed Johnsonzcode closed 3 years ago

Johnsonzcode commented 3 years ago

Hi, Forrest

I want to use multi_to_single_fast5 to get sinlge fast5 files for downstream analysis. Unfortunately I use the following code but got only 4000 single fast5 files. (nanopolish) [poultrylab1@pbsnode01 run1]$ multi_to_single_fast5 -i run1_pass_fast5 -s run1_pass_fast5_single --recursive -t 100 | 1 of 1|#############################################################################################################################################################################################################|100% Time: 0:00:06 (nanopolish) [poultrylab1@pbsnode01 run1]$ cd run1_pass_fast5_single/0/ && ls |wc -l 4000

Need help! Thanks in advance!

fbrennen commented 3 years ago

Hi @Johnsonzcode -- how many fast5 files are in that folder? Can you print some of the output of ls run1_pass_fast5/0/ here?

Johnsonzcode commented 3 years ago

4000

Johnsonzcode commented 3 years ago

-rw-rw-r-- 1 poultrylab1 poultrylab1 81K Jun 24 16:22 008d21c2-36d2-4a7c-ac97-a35b8c55cca7.fast5 -rw-rw-r-- 1 poultrylab1 poultrylab1 184K Jun 24 16:22 00847582-c127-4736-bd57-3e70714ef741.fast5 -rw-rw-r-- 1 poultrylab1 poultrylab1 64K Jun 24 16:22 00846d89-a848-4a55-a530-dfccbba4c941.fast5 -rw-rw-r-- 1 poultrylab1 poultrylab1 505K Jun 24 16:22 007a8686-bc2a-46c9-8628-4a7dbb30c2c3.fast5 -rw-rw-r-- 1 poultrylab1 poultrylab1 85K Jun 24 16:22 004c541a-81d9-4865-8535-ddd6fb85f051.fast5 -rw-rw-r-- 1 poultrylab1 poultrylab1 172K Jun 24 16:22 00498b18-5bcc-4f73-bba0-414a3b67df3e.fast5 -rw-rw-r-- 1 poultrylab1 poultrylab1 97K Jun 24 16:22 003dffcc-d0cc-40c8-8a33-b260006d9569.fast5 -rw-rw-r-- 1 poultrylab1 poultrylab1 47K Jun 24 16:22 001cede0-f1c2-4fbc-b415-75c5e7ca9e77.fast5 -rw-rw-r-- 1 poultrylab1 poultrylab1 166K Jun 24 16:22 001130e1-836c-44ce-bb6f-c5af6d8659a5.fast5

The last 9 lines.

Johnsonzcode commented 3 years ago

My multipe fast5 file size (nanopolish) [poultrylab1@pbsnode01 run1_pass_fast5]$ ll total 456G -rwxrwxr-x 1 poultrylab1 poultrylab1 456G Jun 22 21:33 all_fast5_pass.fast5

fbrennen commented 3 years ago

Apologies, I meant how many fast5 files are there in your input folder (run1_pass_fast5) -- what you've pasted above looks like the output folder to me. Is it just that one file from your ll command?

Johnsonzcode commented 3 years ago

Only one file all_fast5_pass.fast5 in run1_pass_fast5 . (nanopolish) [poultrylab1@pbsnode01 run1]$ ll run1_pass_fast5 total 456G -rwxrwxr-x 1 poultrylab1 poultrylab1 456G Jun 22 21:33 all_fast5_pass.fast5

Johnsonzcode commented 3 years ago

I use h5ls to check my fast5 file: (nanopolish) [poultrylab1@pbsnode01 run1_pass_fast5]$ h5ls all_fast5_pass.fast5 | wc -l 4000

It seems only 4000 fast5 records in it, but I got so many reads from biotech company far more than 4000 in this sequencing run.

fbrennen commented 3 years ago

Ah, ok, so it does look like multi_to_single is working correctly. Usually there will be 4000 reads per multi-read file (just like you're seeing), so I would have expected your biotech company to provide you with many more files than just the one. I would check with them and see if there are more files, or if this file has been packed in some odd way that's hiding some of the reads -- a size of 456G suggests there should be quite a lot more reads in there than just 4000.

Johnsonzcode commented 3 years ago

Yes, I think this fast5 file should contain much more reads or single fast5 records based on file size, have you ever encounter this situation?

fbrennen commented 3 years ago

I haven't! We don't normally put more than 4000 reads into a file, so I'm curious how your biotech company generated this large one. Is there any chance it's a zip file or another archive of some sort?

Johnsonzcode commented 3 years ago

I am contacting company right now, maybe anothor software to compress fast5.

------------------ 原始邮件 ------------------ 发件人: "nanoporetech/ont_fast5_api" @.>; 发送时间: 2021年6月24日(星期四) 晚上6:54 @.>; 抄送: "Johnson @.**@.>; 主题: Re: [nanoporetech/ont_fast5_api] multi_to_single_fast5 generated only 4000 single fast5 (#55)

I haven't! We don't normally put more than 4000 reads into a file, so I'm curious how your biotech company generated this large one. Is there any chance it's a zip file or another archive of some sort?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Johnsonzcode commented 3 years ago

Thanks a lot!

ssscj commented 4 months ago

Thanks a lot!

Hi, I met the same problem. Have you found the solution? Thanks.