nanoporetech / megalodon

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Other
197 stars 30 forks source link

Guppy server returned invalid read: 'movement' #215

Closed danrdanny closed 2 years ago

danrdanny commented 2 years ago

Hi, running Megalodon version: 2.3.4, with Guppy 5.0.7 as the basecall server on a WGS sample generated on a R9 PromethION flowcell.

Command I used:

megalodon \
    /path/to/fast5/ \
    --outputs basecalls mappings mod_mappings mods \
    --reference /path/to/ref/hg38.no_alt.fa \
    --mod-map-emulate-bisulfite \
    --mod-map-base-conv C T --mod-map-base-conv m C \
    --devices 1 --processes 30 \
    --guppy-server-path ~/bin/ont-guppy_5.0.7/bin/guppy_basecall_server \
    --output-directory sample-try4 \
    --overwrite

Megalodon ran for ~3-4 hours then gave the following error repeated 20-30 times:

Process ReadWorker010:

Traceback (most recent call last):
  File "/net/eichler/vol26/7200/software/modules-sw/megalodon/2.3.4/Linux/CentOS7/x86_64/megalodon_env/lib/python3.8/site-packages/megalodon/backends.py", line 301, in parse_pyguppy_called_read
    move=read_datasets["movement"], 77.53reads/s, samples/s=1.12e+7]
KeyError: 'movement'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/net/eichler/vol26/7200/software/modules-sw/miniconda/4.9.2/Linux/CentOS7/x86_64/envs/python3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/net/eichler/vol26/7200/software/modules-sw/miniconda/4.9.2/Linux/CentOS7/x86_64/envs/python3/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/net/eichler/vol26/7200/software/modules-sw/megalodon/2.3.4/Linux/CentOS7/x86_64/megalodon_env/lib/python3.8/site-packages/megalodon/megalodon.py", line 481, in _process_reads_worker
    for bc_res in model_info.iter_basecalled_reads(
  File "/net/eichler/vol26/7200/software/modules-sw/megalodon/2.3.4/Linux/CentOS7/x86_64/megalodon_env/lib/python3.8/site-packages/megalodon/backends.py", line 752, in iter_basecalled_reads
    for bc_res in self.pyguppy_run_model(
  File "/net/eichler/vol26/7200/software/modules-sw/megalodon/2.3.4/Linux/CentOS7/x86_64/megalodon_env/lib/python3.8/site-packages/megalodon/backends.py", line 1306, in pyguppy_run_model
    for called_read, sig_info, seq_summ_info in self.pyguppy_basecall(
  File "/net/eichler/vol26/7200/software/modules-sw/megalodon/2.3.4/Linux/CentOS7/x86_64/megalodon_env/lib/python3.8/site-packages/megalodon/backends.py", line 1155, in pyguppy_basecall
    for comp_read, read_id in self.pyguppy_get_completed_reads(
  File "/net/eichler/vol26/7200/software/modules-sw/megalodon/2.3.4/Linux/CentOS7/x86_64/megalodon_env/lib/python3.8/site-packages/megalodon/backends.py", line 1110, in pyguppy_get_completed_reads
    parse_pyguppy_called_read(called_read),
  File "/net/eichler/vol26/7200/software/modules-sw/megalodon/2.3.4/Linux/CentOS7/x86_64/megalodon_env/lib/python3.8/site-packages/megalodon/backends.py", line 307, in parse_pyguppy_called_read
    raise mh.MegaError(f"Guppy server returned invalid read: {str(e)}")
megalodon.megalodon_helper.MegaError: Guppy server returned invalid read: 'movement'

The guppy_log file doesn't have any errors in it, just a series of timeout/disconnect messages:

2021-11-11 00:20:42.919479 [guppy/info] Client 21 anonymous_client_21 id: 1f57dc07-0df2-48ed-bca6-bf53e9ded92c has timed out.
2021-11-11 00:20:42.934684 [guppy/info] Client 21 anonymous_client_21 id: 1f57dc07-0df2-48ed-bca6-bf53e9ded92c has disconnected.
2021-11-11 00:21:04.936811 [guppy/info] Client 31 anonymous_client_31 id: aa1728c2-cfb9-45c1-bfec-6f0bceb1c4ce has timed out.
2021-11-11 00:21:04.936887 [guppy/info] Client 31 anonymous_client_31 id: aa1728c2-cfb9-45c1-bfec-6f0bceb1c4ce has disconnected.

The log.txt file in the megalodon output dir didn't give any errors either, just a series of extraction messages:

DBG 00:27:38 : ReadIDsExtractedFrom: /path/to/fast5/PAH86453_1cb7e57a_1224.fast5 4000 --- FileFiller-ReadEnumThread001 fast5_io.py:298
DBG 00:27:38 : ReadIDsExtractedFrom: /path/to/fast5/PAH86453_1cb7e57a_2031.fast5 4000 --- FileFiller-ReadEnumThread005 fast5_io.py:298

Let me know if any other data would be helpful. Thanks!

marcus1487 commented 2 years ago

This bug was fixed in 2.3.5 https://github.com/nanoporetech/megalodon/issues/203