nanoporetech / megalodon

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Other
197 stars 30 forks source link

numpy.AxisError: axis 1 is out of bounds for array of dimension 1 #344

Open jazsakr opened 1 year ago

jazsakr commented 1 year ago

Hello!

I have been using Megalodon with R9.4.1 (Guppy 6.1.5 and Remora 1.0.0) and we started using R10.4.1. I would like to run Megalodon on data generated with the new chemistry to get the --write-mods-text output. So I updated Guppy (to 6.4.6) and Remora (to 2.0.0) to get the appropriate config files.

I found the Guppy config file by running ls ~/packages/ont-guppy_v6.4.6/data/ | grep 10.4.1 | grep 400 | grep mod | grep sup and got this dna_r10.4.1_e8.2_400bps_modbases_5mc_cg_sup.cfg.

I found the Remora config file by running remora model list_pretrained

Pore                     Basecall_Model_Type    Basecall_Model_Version    Modified_Bases    Remora_Model_Type      Remora_Model_Version  Remora_Model_URL
-----------------------  ---------------------  ------------------------  ----------------  -------------------  ----------------------  --------------------------------
dna_r10.4.1_e8.2_400bps  sup                    v3.5.1                    5mc               CG                                        2  6zo86p9z4me6hl4di12cimbqjd9n7p25
dna_r10.4.1_e8.2_400bps  hac                    v3.5.1                    5mc               CG                                        2  aub3do2tzhgzrhu2o100lg80d8yv9t91
dna_r10.4.1_e8.2_400bps  fast                   v3.5.1                    5mc               CG                                        2  e8zczcd15rhhs6eppuwmkehwubo848nu
dna_r10.4.1_e8.2_400bps  sup                    v4.0.0                    5hmc_5mc          CG                                        2  whlux6wohu5fwyg4mreg5mz0a8iaugxk
dna_r10.4.1_e8.2_400bps  hac                    v4.0.0                    5hmc_5mc          CG                                        2  o1ah5qgd77l393gjnyrv44jt4wp6wxcl
dna_r10.4.1_e8.2_400bps  fast                   v4.0.0                    5hmc_5mc          CG                                        2  hms1t8ledf09p8ta9katj90l29uz2qll

Then I went to my sample folder with the raw data and ran the following:

megalodon ./ --guppy-server-path ~/packages/ont-guppy_v6.4.6_mega/bin/guppy_basecall_server \
--guppy-config dna_r10.4.1_e8.2_400bps_modbases_5mc_cg_sup.cfg \
--guppy-params "--num_callers 4 --gpu_runners_per_device 2 --chunks_per_runner 120" \
--remora-modified-bases dna_r10.4.1_e8.2_400bps sup v3.5.1 5mc CG 2 \
--outputs mod_basecalls mod_mappings \
--mod-map-emulate-bisulfite --mod-map-base-conv C T --mod-map-base-conv m C \
--write-mods-text --sort-mappings \
--reference ~/references/mm39.fa.gz --devices cuda:0,1 \
--output-directory mega

After processing a some reads (status reached Read Processing: 2%), I got this error:

Traceback (most recent call last):
  File "/home/megalodon/miniconda3/lib/python3.10/site-packages/megalodon/megalodon.py", line 138, in handle_errors
    out_q.put((func(*args), r_vals))
  File "/home/megalodon/miniconda3/lib/python3.10/site-packages/megalodon/mods.py", line 1777, in call_read_mods
    r_mod_scores = call_read_mods_remora(
  File "/home/megalodon/miniconda3/lib/python3.10/site-packages/megalodon/mods.py", line 1696, in call_read_mods_remora
    log_probs = mh.log_softmax_axis1(mod_calls)[:, 1:].astype(np.float64)
  File "/home/megalodon/miniconda3/lib/python3.10/site-packages/megalodon/megalodon_helper.py", line 502, in log_softmax_axis1
    e_x = np.exp((x.T - np.max(x, axis=1)).T)
  File "<__array_function__ internals>", line 200, in amax
  File "/home/megalodon/miniconda3/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2820, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/home/megalodon/miniconda3/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
numpy.AxisError: axis 1 is out of bounds for array of dimension 1

Then it gets stuck at this point, not processing any other reads.

When I look at the error file unexpected_megalodon_errors.2213.err, it seems like certain reads are causing this:

$ more unexpected_megalodon_errors.737.err 
./m002/20230419_1927_X2_FAW41342_c99a977d/fast5_pass/FAW41342_pass_c99a977d_d088e559_2.fast5:::7782d97f-d3ed-4905-85d2-bb7a8b174a47
:::
Traceback (most recent call last):
# same error message as above

How do I solve this issue?

Thank you so much for any help!

marcus1487 commented 1 year ago

The modkit extract command from the modkit package was added last week to replace the --write-mods-text output. This new command works from the output BAM files from Guppy or Dorado containing modified base calls. I would highly recommend shifting to the new framework as megalodon is deprecated and not under active development.

jazsakr commented 1 year ago

Just started using it, thank you!