nanoporetech / megalodon

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Other
197 stars 30 forks source link

Segmentation fault (core dumped) on running Megalodon #292

Open nmierzam opened 2 years ago

nmierzam commented 2 years ago

Hi, I am trying to run the following command on some fast5 file to do a 6mA methylation analysis.

megalodon raw_fast5s/ \
--guppy-server-path /home/nataly/nanopore/ont-guppy-cpu/bin/guppy_basecall_server \
--guppy-params "-d /home/nataly/nanopore/rerio/basecall_models/ --num_callers 20" \
--guppy-config res_dna_r941_min_modbases-all-context_v001.cfg \
--outputs basecalls mappings mod_mappings mods per_read_mods \
--reference /home/nataly/nanopore/ReferenceGenomes/pLJM1-EGFP_reference_seq.fasta \
--process 4 \
--overwrite \

This command errors out with the message

[11:13:23] Running Megalodon version 2.5.0
[11:13:23] Loading guppy basecalling backend
Segmentation fault (core dumped)

I attached the output log files.

log.txt

guppy_basecall_server_log-2022-06-08_11-13-23.log

I am using:

Python 3.6.9 Megalodon version: 2.5.0 Guppy-cpu version 6.1.7

I am running Ubuntu 18.04 with 8GB RAM in Vmware with Windows 10 Enterprise as a host.

Any help is appreciated.

marcus1487 commented 2 years ago

@AlexdeMendoza , @nmierzam , @marcDabad This looks potentially like a regression in guppy 6.1.* on rerio models. Could you try to run megalodon on a standard canonical model to confirm that this issue is specific to models from rerio?

P.S. Remora is now the recommended models for modified base detection. Using remora for modified base detection may also side-step this issue.

AlexdeMendoza commented 2 years ago

@marcus1487 , indeed, checking the guppy_basecall_server_log-2022-06-10_21-45-04.log it seemed it was always looking for missing files in the rerio models. I am running it now with these parameters:

megalodon PromFast5/ \
--guppy-server-path /usr/bin/guppy_basecall_server \
--guppy-config dna_r9.4.1_450bps_fast.cfg \
--remora-modified-bases dna_r9.4.1_e8 fast 0.0.0 5hmc_5mc CG 0

And seems to be running well.

I assume these are the default Remora models, which should work for both PromethIon and MinIon, no? I was using rerio models with the hope of looking at non-CG methylation and at some point 6mA.

marcus1487 commented 2 years ago

There is an all-context 5mC Remora model (only kit14 atm) in Rerio. We are working very hard to get a 6mA all-context Remora model released as soon as possible.

AlexdeMendoza commented 2 years ago

Thanks! Well, my data was generated with an older kit, that's why Rerio theoretically gives some extra flexibility for the non-CpG contexts (or 6mA). So what should we do to get Rerio models working with guppy 6.1.7 / megalodon 2.5? So let's say these models from Remora come out soon, how would we specify these when running megalodon? I am not entirely sure if the default remora models I am calling in the above command are part of megalodon or guppy 6.1.7 installations.

marcus1487 commented 2 years ago

In terms of all-context mods for R9.4.1 at the moment the Rerio model is the best option, the guppy devs will have a look at this next week as this seems to be a guppy issue.

Instructions for running research Remora models (from Rerio) can be found in the Rerio README https://github.com/nanoporetech/rerio#remora-models.

Lucas-Servi commented 2 years ago

I'm having an issue similar to #294 with config files. I'm running Python=3.9 megalodon=2.5.0 and guppy=6.1.7 on Pop OS 22.04. I'm using r10.4 flowcells and SLK112 kit.

Guppy works perfectly fine with many different config files. But guppy_basecall_server (using megalodon) doesn't seem to properly load config files.

megalodon /.../fast5_pass \ --guppy-config dna_r10.4_e8.1_hac.cfg \ --remora-modified-bases dna_r10.4_e8.1 sup 0.0.0 5hmc_5mc CG 0 \ --outputs basecalls mod_basecalls mappings mod_mappings mods per_read_mods \ --devices 0 \ --processes 14 \ --output-directory /../outdir --overwrite \ --reference ...genomic.mmi

nmierzam commented 2 years ago

There is an all-context 5mC Remora model (only kit14 atm) in Rerio. We are working very hard to get a 6mA all-context Remora model released as soon as possible.

Good to know that!

nmierzam commented 2 years ago

I ran a megalodon's command with the --remora-modified-bases parameter mentioned by AlexMendoza and it is working fine. I am going to try and analyze the modified bases with Tombo because my data is from the Direct RNA sequencing protocol.

Thank you!!

marcDabad commented 2 years ago

In my case it was because I had old flowcells and I should re-run it with rerio models. It worked with old guppy version (5.0.16).