nanoporetech / megalodon

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Other
192 stars 29 forks source link

running megalodon with all context DNA methylation models #319

Open AlineMuyle opened 1 year ago

AlineMuyle commented 1 year ago

I am using megalodon to infer 5mC in all contexts CG, CHG and CHH. After some investigation, it seems that Megalodon version 2.5.0 + Guppy basecall server 6.2.1 do not work with old rerio models such as res_dna_r941_min_modbases-all-context_v001.cfg or res_dna_r941_min_modbases_5mC_v001.cfg

This is a known issue I have seen in various posts such as https://github.com/nanoporetech/megalodon/issues/292 and https://bytemeta.vip/repo/nanoporetech/megalodon/issues/266 Usually people use older versions (Megalodon v2.4.2 + guppy 5.0.16) to be able to run their analyses.

I would like to use the latest versions of the programs and therefore it would be useful if you could please make the necessary changes for a model to work in all contexts with current versions.

For your information, my current installation works with CG context only model dna_r9.4.1_450bps_fast.cfg but with all contexts models the job fails, here bellow is some more details.

I use the following configuration:

And the following command line: megalodon fast5/ \ --guppy-params "-d ./rerio/basecall_models/" \ --guppy-config res_dna_r941_min_modbases-all-context_v001.cfg \ --mod-motif m C 0 \ --guppy-server-path ./ont-guppy_6.2.1/bin/guppy_basecall_server \ --output-directory ./male_megalodon_results_all_contexts --overwrite --mod-output-formats bedmethyl \ --outputs basecalls mods mod_basecalls per_read_mods mod_mappings mappings \ --reference scaffolds.fasta \ --devices 0 1 \ --processes 44 \ --guppy-concurrent-reads 40 \ --guppy-timeout 120 \ --output-directory output-dir \ --num-read-enumeration-threads 1 \ --num-extract-signal-processes 2 \

The job fails and I get the following guppy log:

2022-08-22 06:31:06.579141 [guppy/message] ONT Guppy basecall server software version 6.2.1+6588110, client-server API version 11.0.0, minimap2 version 2.22-r1101 log path: /home/amuyle/male_megalodon_results_all_contexts/guppy_log chunk size: 2000 chunks per runner: 512 max queued reads: 2000 num basecallers: 4 num socket threads: 2 max returned events: 50000 gpu device: cuda:0 cuda:1 kernel path:
runners per device: 4 Use of this software is permitted solely under the terms of the end user license agreement (EULA).By running, copying or accessing this software, you are demonstrating your acceptance of the EULA. The EULA may be found in /home/amuyle/ont-guppy_6.2.1/bin 2022-08-22 06:31:06.579653 [guppy/info] crashpad_handler not supported on this platform. 2022-08-22 06:31:06.580498 [guppy/info] Listening on port ipc:///tmp/ddf6-cc72-0497-139e. 2022-08-22 06:31:08.677890 [guppy/message] Config loaded: config file: /home/amuyle/ont-guppy_6.2.1/data/res_dna_r941_min_modbases_5mC_v001.cfg model file: /home/amuyle/ont-guppy_6.2.1/data/res_dna_r941_min_modbases_5mC_v001.jsn model version id None adapter scaler model file: /home/amuyle/ont-guppy_6.2.1/data/adapter_scaling_dna_r9.4.1_min.jsn 2022-08-22 06:31:08.860612 [guppy/info] CUDA device 0 (compute 7.0) initialised, memory limit 16945709056B (16623796224B free) 2022-08-22 06:31:08.960819 [guppy/info] CUDA device 1 (compute 7.0) initialised, memory limit 16945709056B (16623796224B free) 2022-08-22 06:31:08.967429 [guppy/info] lamp_arrangements arrangement folder not found: /home/amuyle/ont-guppy_6.2.1/data/read_splitting/lamp_arrangements 2022-08-22 06:31:09.126687 [guppy/message] Starting server on port: ipc:///tmp/ddf6-cc72-0497-139e 2022-08-22 06:31:09.139738 [guppy/info] client connection request. ["res_dna_r941_min_modbases_5mC_v001:>timeout_interval=15000>client_name=>barcode_kits=>detect_barcodes=0>move_and_trace_enabled=1>post_out=1"] 2022-08-22 06:31:09.141183 [guppy/info] New client connected Client 1 anonymous_client_1 id: 0a549260-51c4-4d0b-9088-5040248d105e (connection string = 'res_dna_r941_min_modbases_5mC_v001:>timeout_interval=15000>client_name=>barcode_kits=>detect_barcodes=0>move_and_trace_enabled=1>post_out=1').

marcus1487 commented 1 year ago

This is likely an issue with the megalodon and guppy interface. As megalodon is being deprecated I would recommend trying to use the latest Remora all-context model with Bonito. The next release of Remora will make model conversion much easier so that the all-context model can be used with Guppy as well (and Dorado in the near future).

AlineMuyle commented 1 year ago

Could you please explain what you mean by the latest Remora all-context model ? I just reinstalled remora and the following command line 'remora model list_pretrained' only shows CG models.

AlineMuyle commented 1 year ago

I think I found it! https://github.com/nanoporetech/rerio#remora-models Thank you for your help