hyeshik / poreplex

A versatile sequenced read processor for nanopore direct RNA sequencing
Other
78 stars 13 forks source link

Poreplex stops with KeyError #7

Closed RaverJay closed 5 years ago

RaverJay commented 5 years ago

Processing a small batch of fast5s worked, however the full dataset (240k reads) errored out after some time:

2018-09-04 17:54:43,927 Starting poreplex version 0.1 2018-09-04 17:54:43,927 Command line: /home/ya86gul/python/Python_3.6.1/bin/poreplex -i reads -o output --trim-adapter --barcoding --basecall --parallel 10 --fastq --align combined.mmi --dump-basecalled-events --dashboard --contig-aliases ids_to_names.csv 2018-09-04 17:54:43,927 == Analysis settings ====================================== 2018-09-04 17:54:43,927 Input: reads 2018-09-04 17:54:43,927 Output: output 2018-09-04 17:54:43,927 Processes: 10 2018-09-04 17:54:43,927 Presets: rna-r941.cfg 2018-09-04 17:54:43,927 Basecall on-the-fly: Yes (albacore 2.3.1) 2018-09-04 17:54:43,927 Trim 3' adapter: Yes 2018-09-04 17:54:43,927 Filter concatenated read: Yes 2018-09-04 17:54:43,927 Separate by barcode: Yes 2018-09-04 17:54:43,927 Real-time alignment: Yes 2018-09-04 17:54:43,927 FASTQ in output: Yes 2018-09-04 17:54:43,927 FAST5 in output: No 2018-09-04 17:54:43,927 Basecall table in output: Yes 2018-09-04 17:54:43,927 =========================================================== 2018-09-04 17:54:43,927 2018-09-04 18:10:18,614 Unhandled exception KeyError: 'read_id' 2018-09-04 18:10:18,643 Traceback (most recent call last): 2018-09-04 18:10:18,643 File "/home/ya86gul/python/Python_3.6.1/lib/python3.6/site-packages/poreplex/pipeline.py", line 95, in process_batch 2018-09-04 18:10:18,643 if res['read_id'] in barcode_assg: 2018-09-04 18:10:18,643 KeyError: 'read_id' 2018-09-04 18:10:18,808 Finished.

Any idea?

hyeshik commented 5 years ago

Hello RaverJay,

Thank you for reporting the problem! That issue looks like the bug resolved in a report #1. The master branch in the git repo should fix the problem. Can you try it after updating to HEAD? I am planning to release a new version including fixes for the bugs soon.

RaverJay commented 5 years ago

Hey thanks for the info.

I tried pip uninstall poreplex then pip install git+https://github.com/hyeshik/poreplex.git which finally gives: Successfully installed poreplex-0.2

But I still run into the KeyError, did I not update properly?

EDIT: nvm, the reinstall actually seems to have fixed it.

Computation completes, but included: 2018-09-05 14:14:08,325 * Failed to open: 53 2018-09-05 14:14:08,325 - File could not be opened due to unknown error: 53

poreplex.log includes the Errors: 2018-09-05 13:08:42,199 [signal_analyzer.py:121] Unhandled exception OSError: Unable to open file (file signature not found) Traceback (most recent call last): File "/home/ya86gul/python/Python_3.6.1/lib/python3.6/site-packages/poreplex/signal_analyzer.py", line 121, in process return SignalAnalysis(filename, self).process() File "/home/ya86gul/python/Python_3.6.1/lib/python3.6/site-packages/poreplex/signal_analyzer.py", line 189, in init self.open_data_files(filename) File "/home/ya86gul/python/Python_3.6.1/lib/python3.6/site-packages/poreplex/signal_analyzer.py", line 201, in open_data_files self.fast5 = Fast5File(fast5path, 'r') File "/home/ya86gul/python/Python_3.6.1/lib/python3.6/site-packages/ont_fast5_api/fast5_file.py", line 70, in init self.status = Fast5Info(self.filename) File "/home/ya86gul/python/Python_3.6.1/lib/python3.6/site-packages/ont_fast5_api/fast5_info.py", line 60, in init with h5py.File(fname, 'r') as handle: File "/home/ya86gul/python/Python_3.6.1/lib/python3.6/site-packages/h5py/_hl/files.py", line 312, in init fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr) File "/home/ya86gul/python/Python_3.6.1/lib/python3.6/site-packages/h5py/_hl/files.py", line 142, in make_fid fid = h5f.open(name, flags, fapl=fapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5f.pyx", line 78, in h5py.h5f.open OSError: Unable to open file (file signature not found)

Ideas what causes this? Dropping 53 reads isn't the worst, but it might be a bug.

hyeshik commented 5 years ago

Good! A normal FAST5 file must be a valid HDF5 file. Can you please check if the other HDF5 tools could open the failed files? There're vitables and HDF View which provide the convenient access to the files.

RaverJay commented 5 years ago

Sorry I took so long - I don't see how I can find the filenames of the failing files? Error messages don't include it and there is no status value for this in the sequencing_summary.txt

hyeshik commented 5 years ago

@RaverJay, new release now prints a detailed error message and the FAST5 file name in poreplex.log. Please give it a try!

RaverJay commented 5 years ago

Yeah I just saw it, awesome release!

vitables also says these are not valid HDF5 files. No idea why, maybe copying failed for those few.

Ran the new version, demultiplexing yield seems to have improved slightly, and I'll play around a bit with the fail reads also.

Thanks for your help!