nanoporetech / megalodon

Megalodon is a research command line tool to extract high accuracy modified base and sequence variant calls from raw nanopore reads by anchoring the information rich basecalling neural network output to a reference genome/transriptome.
Other
197 stars 30 forks source link

Aggregation does not start #219

Closed amauryavril closed 2 years ago

amauryavril commented 3 years ago

Hello,

I am running Megalodon and everything is working well until the aggregation step:

[21:05:11] Loading reference
[21:05:11] Loaded model calls canonical alphabet ACGT and modified bases h=5hmC (alt to C); m=5mC (alt to C)
[21:05:11] Preparing workers to process reads
[21:05:11] Processing reads
Full output or empty input queues indicate I/O bottleneck
3 most common unsuccessful processing stages:
    -----                                                                                                                          [2021-11-17 21:05:11.236992] [0x00007fc2e3c8a700] [info]    Connecting to server as ['']
[2021-11-17 21:05:11.238081] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: 9e5f16ce-5574-43e2-8c20-7d0a348fe653rocessing: 0reads [00:00, ?reads/s]
[2021-11-17 21:05:11.238457] [0x00007fc2e3c8a700] [info]    Connecting to server as ''                                     | 0/10000
[2021-11-17 21:05:11.239343] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: 419c9920-9189-4670-a25f-ae1897a14bc1
[2021-11-17 21:05:11.240303] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.241423] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: 658e86be-9376-4a5a-9eaa-68965fa2cda7
[2021-11-17 21:05:11.243546] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.245140] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.246201] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: 88412a4b-bdb2-4cab-a86a-89ea03ac3f2e
[2021-11-17 21:05:11.247007] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.247169] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.247243] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.248380] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: a009bd13-ed8c-4ad9-b04b-8bc28db58533
[2021-11-17 21:05:11.249145] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: 31e930fb-11c9-4845-ab69-dc30838edb19
[2021-11-17 21:05:11.249786] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: 990e9308-5d25-4523-81f0-97d5c5a78389
[2021-11-17 21:05:11.254987] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.254992] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.256282] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: 3afb737c-2f36-4487-8af5-1fe1893ef4f1
[2021-11-17 21:05:11.256655] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: cc57142e-303c-4239-a8f5-309cd293797f
[2021-11-17 21:05:11.258981] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.258992] [0x00007fc2e3c8a700] [info]    Connecting to server as ''
[2021-11-17 21:05:11.260049] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: 7157216f-e442-4520-8b96-3be3100d1814
[2021-11-17 21:05:11.283840] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: b24cf791-6480-4b34-944c-ecc4eeb7305d
[2021-11-17 21:05:11.303739] [0x00007fc2e3c8a700] [info]    Connected to server as ''. Connection id: 4d6713be-833b-4ed2-b170-ae89305cc173
     1.3% (  15096 reads) : No alignment                                                                                           [22:10:43] Waiting for mods database to complete indexing                                                    
     1.3% (  15096 reads) : No alignment                                                                    
     0.0% (    246 reads) : Unexpected error                                                                ds/s, samples/s=3.45e+6]
    -----eue capacity extract_signal      : 100%|██████████████████████████████████████████████████████████████████████| 10000/10000
Read Processing: 100%|███████████████████████████████████████████| 1291014/1291014 [1:05:31<00:00, 328.35reads/s, samples/s=3.45e+6]
 input queue capacity extract_signal      :   0%|                                                                          | 0/10000
output queue capacity per_read_mods       :   0%|                                                                         | 11/10000
******************** WARNING: Unexpected errors occured. See full error stack traces for first (up to) 50 errors in "unexpected_megalodon_errors.8992.err" ********************
[22:10:43] Unsuccessful processing types:
     1.3% (  16157 reads) : No alignment                                                                    
     0.0% (    262 reads) : Unexpected error                                                                
[22:15:21] Spawning modified base aggregation processes
[22:15:21] Aggregating 362826850 per-read modified base statistics
[22:15:21] NOTE: If this step is very slow, ensure the output directory is located on a fast read disk (e.g. local SSD). Aggregation can be restarted using the `megalodon_extras aggregate run` command
Mods:   0%|                                                                              | 0/362826850 [00:00<?, ? per-read calls/s]

It doesn't want to start after 10h like that. I tried the megalodon_extras aggregate run command but the same problem occurs. I can see that the .bed and .vcf files are generated but they remain empty. The first step worked though as I have usable bam and .db files generated. Do you know how to solve this? Thanks!

For this project, I barcoded several samples that I ran together. I did the basecalling with Guppy and I demultiplexed the fast5 based on the Guppy sequencing summary results with demux_fast5 from the ont_fast5_api interface. I now want to run megalodon on each demultiplexed fast5. Here is the command I used:

MODEL="res_dna_r941_min_modbases_5mC_5hmC_v001.cfg"
REFERENCE="/reference_genome/genome.fa"

megalodon \
                /Demultiplex_fast5/sample1 \
                --guppy-params "-d /home/gecf/rerio/basecall_models/ --chunk_size 1000" \
                --guppy-config $MODEL \
                --guppy-server-path /usr/bin/guppy_basecall_server \
                --outputs mod_mappings mods per_read_mods \
                --reference $REFERENCE \
                --mod-output-formats bedmethyl modvcf \
                --output-directory /out/sample1 \
                --devices 0 \
                --processes 15

I am running this command on a computer with Ubuntu v20.4 - i7 16 cores - GeForce RTX 3080 Ti. Program versions are:

Guppy v5.0.16 Megalodon v2.3.5 Python 3.8.10

Thanks!

marcus1487 commented 2 years ago

This generally occurs over a network file system. Could you move the megalodon output to local disk and try to run megalodon_extras aggregate run?