nanoporetech / ont_fast5_api

Oxford Nanopore Technologies fast5 API software
Other
144 stars 28 forks source link

fast5_subset failing mid-run #51

Closed irenenewton closed 3 years ago

irenenewton commented 3 years ago

Running fast5_subset on a docker container with python 3, and ont-fast5-api installed (v 3.3.0) + all dependencies like so:

root@42aca274f9f3:/data# ./run_fast5.sh

Oddly, the script runs up until 2% of the reads have been extracted using the subset flatfile, then it fails with the following error:

DEBUG:h5py._conv:Creating converter from 5 to 3 | 0% ETA: --:--:-- Traceback (most recent call last): | 2% ETA: 1:02:08 File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/conversion_tools/fast5_subset.py", line 261, in extract_selected_reads output_f5.add_existing_read(read, target_compression=target_compression) File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/multi_fast5.py", line 82, in add_existing_read self._add_read_from_multi(read_to_add, target_compression, sanitize=sanitize) File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/multi_fast5.py", line 105, in _add_read_from_multi if read_to_add.run_id in self.run_id_map: File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/multi_fast5.py", line 69, in run_id_map for read in self.get_reads(): File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/multi_fast5.py", line 27, in get_reads yield Fast5Read(self, group_name[5:]) File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/fast5read.py", line 61, in init self.handle = parent.handle["read" + read_id] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/usr/local/lib/python3.8/dist-packages/h5py/_hl/group.py", line 288, in getitem oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open KeyError: 'Unable to open object (bad object header version number)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/bin/fast5_subset", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/conversion_tools/fast5_subset.py", line 326, in main multifilter.run_batch() File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/conversion_tools/fast5_subset.py", line 103, in run_batch self._launch_sync_tasks() File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/conversion_tools/fast5_subset.py", line 129, in _launch_sync_tasks reads, out_file, in_file = extract_selected_reads(*args_tuple) File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/conversion_tools/fast5_subset.py", line 269, in extract_selected_reads raise ExtractionException(exception, output_file) ont_fast5_api.conversion_tools.fast5_subset.ExtractionException: (KeyError("Error processing file Run_12_6_2020/Run_12_6_2020/Run_12_6_2020/20201206_2104_MN30516_FAO46609_3100efdb/fast5_pass/FAO46609_pass_83a97ca0_95.fast5: ('Unable to open object (bad object header version number)',)"), 'Run_12_6_2020/Run_12_6_2020/Run_12_6_2020/20201206_2104_MN30516_FAO46609_3100efdb/batch1.fast5') root@42aca274f9f3:/data#

fbrennen commented 3 years ago

Hi @irenenewton -- thanks for letting us know. Could you try running the single file it complained about (Run_12_6_2020/Run_12_6_2020/Run_12_6_2020/20201206_2104_MN30516_FAO46609_3100efdb/fast5_pass/FAO46609_pass_83a97ca0_95.fast5) to see if that crashes fast5_subset? If it does, and you're ok with giving us the file, we can try and figure out exactly what's going on.

irenenewton commented 3 years ago

Interesting new error:

root@e9216b63e9b0:/# fast5_subset -i Run_12_6_2020/Run_12_6_2020/Run_12_6_2020/20201206_2104_MN30516_FAO46609_3100efdb/fast5_pass/FAO46609_pass_83a97ca0_95.fast5 -s Run_12_6_2020/Run_12_6_2020/Run_12_6_2020/20201206_2104_MN30516_FAO46609_3100efdb/ -l list_reads_mapped_to_virus.txt Traceback (most recent call last): File "/usr/local/bin/fast5_subset", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/conversion_tools/fast5_subset.py", line 315, in main multifilter = Fast5Filter(input_folder=args.input, File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/conversion_tools/fast5_subset.py", line 35, in init assert path.isdir(input_folder) AssertionError

Happy to share the file with you. Email me and I can provide a link.

fbrennen commented 3 years ago

Hi @irenenewton -- the error you're getting there is because the -i argument to fast5_subset expects a folder, not a file. If you put that file in its own folder it should work. I'll get in touch with you about the file though.

irenenewton commented 3 years ago

I was giving it a folder, originally, not a file. The original submission script pointed it to a directory (fast5_pass). When I get a second I can rerun the fast5 file you pointed to in its own dir. Here's the same error, when I've moved that 95.past5 file to its own dir:

root@e9216b63e9b0:/# fast5_subset -i test/ -s Run_12_6_2020/Run_12_6_2020/Run_12_6_2020/20201206_2104_MN30516_FAO46609_3100efdb/ -l list_reads_mapped_to_virus.txt Traceback (most recent call last): File "/usr/local/bin/fast5_subset", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/conversion_tools/fast5_subset.py", line 315, in main multifilter = Fast5Filter(input_folder=args.input, File "/usr/local/lib/python3.8/dist-packages/ont_fast5_api/conversion_tools/fast5_subset.py", line 36, in init assert path.isfile(read_list_file) AssertionError

fbrennen commented 3 years ago

Yes, in the original submission script, absolutely! I was only talking about your second attempt. =)

Please email the file to support@nanoporetech.com and ask them to pass it to me.

fbrennen commented 3 years ago

Hi @irenenewton -- apologies for the delay. I've had a look at the particular file that was in your error messages (FAO46609_pass_83a97ca0_95.fast5) and that file appears to be corrupt. We can definitely improve how we handle these files in ont-fast5-api, but in the meantime you can remove that file from the ones you're using and (assuming there are no other corrupt fast5 files) your call to fast5_subset should then work.