hasindu2008 / slow5tools

Slow5tools is a toolkit for converting (FAST5 <-> SLOW5), compressing, viewing, indexing and manipulating data in SLOW5 format.
https://hasindu2008.github.io/slow5tools
MIT License
90 stars 6 forks source link

Error concerning conflict of run-ids in multi-fast5 file #117

Open sachingadakh opened 1 week ago

sachingadakh commented 1 week ago

Hello Hasindu, We performed the adaptive sampling for the enrichment of methylation. Since we performed it on Mk1B connected to GPU-enabled workstation, only adaptive sampling was performed, not basecalling. Therefore, we received only pod5 files, which I merged into a single pod5 file and then converted to fast5 files. Further to check if there is any bad fast5 , I attempted to convert into blow5 using slowtools f2s. I received the following error :

slow5tools-v1.1.0/./slow5tools f2s . -d blow_test/ [list_all_items] Looking for '.fast5' files in . [list_all_items] Looking for '.fast5' files in ./blow5 [list_all_items] Looking for '*.fast5' files in ./blow_test [f2s_main] 166 fast5 files found - took 0.001s [f2s_main] Just before forking, peak RAM = 0.000 GB [f2s_iop] 8 proceses will be used. [f2s_iop] Spawning 8 I/O processes to circumvent HDF hell. [fast5_attribute_itr::ERROR] Ancient fast5: Different run_ids found in an individual multi-fast5 file. Cannot create a single header slow5/blow5. Consider --allow option. [read_fast5::ERROR] Bad fast5: Could not iterate over the read groups in the fast5 file ./sarcoma_tumour_PO1.95_0.fast5. [f2s_child_worker::ERROR] Bad fast5: Could not read contents of the fast5 file './sarcoma_tumour_PO1.95_0.fast5'. [fast5_attribute_itr::ERROR] Ancient fast5: Different run_ids found in an individual multi-fast5 file. Cannot create a single header slow5/blow5. Consider --allow option. [read_fast5::ERROR] Bad fast5: Could not iterate over the read groups in the fast5 file ./sarcoma_tumour_PO1.17_0.fast5. [f2s_child_worker::ERROR] Bad fast5: Could not read contents of the fast5 file './sarcoma_tumour_PO1.17_0.fast5'. [fast5_attribute_itr::ERROR] Ancient fast5: Different run_ids found in an individual multi-fast5 file. Cannot create a single header slow5/blow5. Consider --allow option. [read_fast5::ERROR] Bad fast5: Could not iterate over the read groups in the fast5 file ./sarcoma_tumour_PO1.31_0.fast5. [f2s_child_worker::ERROR] Bad fast5: Could not read contents of the fast5 file './sarcoma_tumour_PO1.31_0.fast5'. [fast5_attribute_itr::ERROR] Ancient fast5: Different run_ids found in an individual multi-fast5 file. Cannot create a single header slow5/blow5. Consider --allow option. [read_fast5::ERROR] Bad fast5: Could not iterate over the read groups in the fast5 file ./sarcoma_tumour_PO1.100_0.fast5. [f2s_child_worker::ERROR] Bad fast5: Could not read contents of the fast5 file './sarcoma_tumour_PO1.100_0.fast5'. [f2s_iop] Child process 697597 exited with status=1. (base) rakieta@rakieta-ThinkStation-P620:/disk2/ONT_files/ONT_methylation_analysis/fast5_data$ [fast5_attribute_itr::ERROR] Ancient fast5: Different run_ids found in an individual multi-fast5 file. Cannot create a single header slow5/blow5. Consider --allow option. [read_fast5::ERROR] Bad fast5: Could not iterate over the read groups in the fast5 file ./sarcoma_tumour_PO1.99_0.fast5. [f2s_child_worker::ERROR] Bad fast5: Could not read contents of the fast5 file './sarcoma_tumour_PO1.99_0.fast5'. [fast5_attribute_itr::ERROR] Ancient fast5: Different run_ids found in an individual multi-fast5 file. Cannot create a single header slow5/blow5. Consider --allow option. [read_fast5::ERROR] Bad fast5: Could not iterate over the read groups in the fast5 file ./sarcoma_tumour_PO1.93_0.fast5. [f2s_child_worker::ERROR] Bad fast5: Could not read contents of the fast5 file './sarcoma_tumour_PO1.93_0.fast5'. [f2s_child_worker::INFO] Summary - total fast5: 21, bad fast5: 0 [f2s_child_worker::INFO] Summary - total fast5: 21, bad fast5: 0

Then I run the same command with --allow option : /slow5tools f2s . -d blow5_test2/ --allow and received following WARNINGS and it reported 0 bad fast5

[f2s_main::WARNING] You have requested to allow run ID mismatches. Generated files are only to be used for intermediate analysis and are not recommended for archiving. [list_all_items] Looking for '.fast5' files in . [list_all_items] Looking for '.fast5' files in ./blow5 [list_all_items] Looking for '.fast5' files in ./blow5_test2 [list_all_items] Looking for '.fast5' files in ./blow_test [f2s_main] 166 fast5 files found - took 0.001s [f2s_main] Just before forking, peak RAM = 0.000 GB [f2s_iop] 8 proceses will be used. [f2s_iop] Spawning 8 I/O processes to circumvent HDF hell. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. This warning is suppressed now onwards. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. This warning is suppressed now onwards. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. This warning is suppressed now onwards. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. This warning is suppressed now onwards. [f2s_child_worker::INFO] Summary - total fast5: 19, bad fast5: 0 [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. This warning is suppressed now onwards. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. [search_and_warn::WARNING] slow5tools-v1.1.0: Ancient fast5: Different run_ids found in an individual multi-fast5 file. First seen run_id will be set in slow5 header. This warning is suppressed now onwards. [f2s_child_worker::INFO] Summary - total fast5: 21, bad fast5: 0 [f2s_child_worker::INFO] Summary - total fast5: 21, bad fast5: 0 [f2s_child_worker::INFO] Summary - total fast5: 21, bad fast5: 0 [f2s_child_worker::INFO] Summary - total fast5: 21, bad fast5: 0 [f2s_child_worker::INFO] Summary - total fast5: 21, bad fast5: 0 [f2s_child_worker::INFO] Summary - total fast5: 21, bad fast5: 0 [f2s_child_worker::INFO] Summary - total fast5: 21, bad fast5: 0 [f2s_main] Converting 166 fast5 files took 63.078s [f2s_main] Children processes: CPU time = 487.949 sec | peak RAM = 0.279 GB

[main] cmd: /home/rakieta/slow5tools-v1.1.0/./slow5tools f2s . -d blow5_test2/ --allow [main] real time = 63.080 sec | CPU time = 0.005 sec | peak RAM = 0.006 GB

My question is, what could be the issue for the error "ERROR : Ancient fast5: Different run_ids found in an individual multi-fast5 file." ? and later with --allow option same error becomes just warning and proceeds to convert to fast5 and, then "bad fast5" from earlier error becomes zero bad fast5.

This is modern sequencing with r10 chemistry and updated MiNKnOW performed adaptive sampling, and then why is the error reporting ancient fast5? Will this affect further indexing, alignment and methylation calculation using f5c tools? Should I have been converted pod5 to fast5 somehow differently?

Thank you

hasindu2008 commented 1 week ago

Hello

I guess you stopped the run for washing the flowcell and restarted a couple of times? That creates separate runIDs.

Somehow when you did the conversion of pod5 to fast5, it mixed up reads from these multiple runIDs into individual files. This used to be the case in ancient multi-fast5 files created through ONT's single-to-multi fast5 converter. Seems like the pod5 to fast5 converter follows that approach then. When you provide --allow, slow5tools simply takes the first runID it sees and save all records under this runID. This means you will lose some metadata about the run. f5c does not use these metadata, so it should be fine. Nevertheless, I would not recommend going down this path. Instead for pod5 to blow5 conversion, you should use blue-crab.

You can first do

blue-crab p2s pod5_dir -d slow5_dir/

then:

slow5tools merge slow5_dir/ -o merged.blow5
sachingadakh commented 1 week ago

Hello Hasindu, Thank you for the response and sorry for getting back late. Well we had restarted after wash, basically with second run with new loading of library for same sample. We ended the first run after some time because we observed deactivated pores in large number due to bubble. Then we perform washing and loading with new library as a second run. Most of the output we got in second run. I merged pod5 files from first and second run together as I earlier mentioned. Them converted single POD5 to fast5 files . As you mentioned that's why there would be seperate runIDs. However my goal to convert blow5 was to see if there is any bad fast5 before I proceed with f5c for methylation calling pipeline. I tried to run blue-crab as you suggested but has run into some error. I will try to fix it and get back to you. Thank you.

hasindu2008 commented 6 days ago

What was the error you faced?

sachingadakh commented 5 days ago

It's related to the slurm. So, it's a concern for the cluster I am using. I am working on resolving it