struct.error: 'i' format requires -2147483648 <= number <= 2147483647

amauryavril commented 3 years ago

Hello,

I ran Megalodon successfully already on CpG only, but when trying to call all the modified C, it throws this error message:

Read Processing: 100%|███████████████████████████████████████████| 1854740/1854740 [1:10:54<00:00, 435.99reads/s, samples/s=4.54e+6] input queue capacity extract_signal : 0%| | 0/10000 output queue capacity basecalls : 0%| | 0/10000 output queue capacity mappings : 0%| | 0/10000 output queue capacity per_read_mods : 0%| | 0/10000 [10:50:15] Unsuccessful processing types: 60.4% (1119798 reads) : No alignment [10:50:15] Waiting for mods database to complete indexing [10:51:03] Spawning modified base aggregation processes [10:51:03] Aggregating 271235986 per-read modified base statistics [10:51:03] NOTE: If this step is very slow, ensure the output directory is located on a fast read disk (e.g. local SSD). Aggregation can be restarted using themegalodon_extras aggregate runcommand Mods: 0%| | 0/271235986 [00:00<?, ? per-read calls/s]Traceback (most recent call last): File "/home/gecf/anaconda3/lib/python3.7/multiprocessing/queues.py", line 242, in _feed send_bytes(obj) File "/home/gecf/anaconda3/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/home/gecf/anaconda3/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 Traceback (most recent call last): File "/home/gecf/anaconda3/lib/python3.7/multiprocessing/queues.py", line 242, in _feed send_bytes(obj) File "/home/gecf/anaconda3/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/home/gecf/anaconda3/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 Mods: 7%|███▉ | 18121400/271235986 [05:08<1:11:53, 58680.46 per-read calls/s] [10:56:12] Mega Done

It seems to be related to multiprocessing but I'm not sure how to solve this. The basecalling seems to work though and I have some output written but not all of it. Do you know what's the issue? Thank you!

Here is the Guppy log file that was in my megalodon output folder: guppy_basecall_server_log-2021-11-01_13-42-52.log For the Megalodon log.txt file, since it is very big, here are the first lines: [13:42:52] Running Megalodon version 2.3.4 DBG 13:42:52 : Command: """/home/gecf/anaconda3/bin/megalodon /media/gecf/Data/Raw_data/Minion0008/minixid00010/20211022_1148_MN31779_FAQ95879_94a74f62/fast5 --guppy-params -d /home/gecf/rerio/basecall_models/ --chunk_size 100 --guppy-config res_dna_r941_min_modbases_5mC_5hmC_CpG_v001.cfg --guppy-server-path /usr/bin/guppy_basecall_server --outputs basecalls mappings mod_mappings mods --reference /home/gecf/Desktop/script_basecalling/reference_genome/control_DNA.fa --mod-output-formats bedmethyl --mod-min-prob 0.01 --output-directory /home/gecf/Desktop/mega_out --devices 0 --processes 16""" --- MainProcess-MainThread megalodon.py:1750 [13:42:52] Loading guppy basecalling backend DBG 13:42:52 : Guppy version: "5.0.16" --- MainProcess-MainThread backends.py:869 DBG 13:42:52 : Pyguppy version: "5.0.16" --- MainProcess-MainThread backends.py:870 DBG 13:42:52 : guppy server init command: "/usr/bin/guppy_basecall_server -p auto -l /home/gecf/Desktop/mega_out/guppy_log -c res_dna_r941_min_modbases_5mC_5hmC_CpG_v001.cfg --post_out --quiet -x cuda:0 -d /home/gecf/rerio/basecall_models/ --chunk_size 100" --- MainProcess-MainThread backends.py:945 DBG 13:42:52 : Found guppy log file: /home/gecf/Desktop/mega_out/guppy_log/guppy_basecall_server_log-2021-11-01_13-42-52.log --- MainProcess-MainThread backends.py:959 DBG 13:42:54 : Connecting to server --- MainProcess-MainThread backends.py:803 DBG 13:42:54 : pyguppy server status: result.success --- MainProcess-MainThread backends.py:840 DBG 13:42:54 : pyguppy server config: {'/home/gecf/rerio/basecall_models/res_dna_r941_min_modbases_5mC_5hmC_CpG_v001.cfg': {'Basecalling': {'ModelFile': 'res_dna_r941_min_modbases_5mC_5hmC_CpG_v001.jsn', 'res_dna_r941_min_modbases_5mC_5hmC_CpG_v001.jsn': {'version': {}}}}, 'config load results': {'m_name': 'config_loader', '/home/gecf/rerio/basecall_models/res_dna_r941_min_modbases_5mC_5hmC_CpG_v001.cfg': {'status': 'loaded', 'failed_reason': ''}}} --- MainProcess-MainThread backends.py:842 DBG 13:42:54 : init_test_read BasecallingCompleted --- MainProcess-MainThread backends.py:1110 [13:42:54] Loading reference ******************** WARNING: "mods" output requested, so "per_read_mods" will be added to outputs. ******************** [13:42:54] Loaded model calls canonical alphabet ACGT and modified bases m=5mC (alt to C); h=5hmC (alt to C) [13:42:54] Preparing workers to process reads DBG 13:42:54 : Starting --- FileFiller-MainThread fast5_io.py:332 DBG 13:42:54 : Starting --- FileFiller-FileEnum fast5_io.py:303 DBG 13:42:54 : Starting --- FileFiller-ReadEnumThread000 fast5_io.py:287

marcus1487 commented 3 years ago

I have seen issues similar to this one previously. I believe they are related to the core python size of multiprocessing connections being too small. What version of python are you using and was it installed in a custom manner?

amauryavril commented 3 years ago

Hello Marcus,

I have python 3.7.0 installed on my computer, which I think I installed manually with conda: conda install -c anaconda python=3.7

Thank you for your help!

marcus1487 commented 3 years ago

Could you try an installation outside of conda? I think the other times I've seen this it was also a conda python install.

amauryavril commented 3 years ago

Hello Marcus,

Sorry for the delay. I did a fresh install of python and megalodon outside of conda and it works perfectly. Thank you!

nanoporetech / megalodon

struct.error: 'i' format requires -2147483648 <= number <= 2147483647 #209