wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
413 stars 47 forks source link

issue #181 - struct.error: 'i' format requires -2147483648 <= number <= 2147483647 #235

Closed shimbalama closed 3 years ago

shimbalama commented 3 years ago

Hi Wouter,

--huge doesn't solve this problem for me. I have used NanoPlot for 2 years now without issue (thanks!), all bams created the same way (NGMLR to map ONT reads). The bam in question is 222GB.

Is this something you coul dlook into?

Ty, Liam

wdecoster commented 3 years ago

Hi Liam,

I can try :-) Which organism are you working on? Do you have a log file of one of the erroneous runs?

Cheers, Wouter

shimbalama commented 3 years ago

Thanks Wouter!

Working on human (cancer so ~60x deep)

log

Process Process-1: Traceback (most recent call last): File "/software/python/Python-3.6.1/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap self.run() File "/software/python/Python-3.6.1/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/software/python/Python-3.6.1/lib/python3.6/concurrent/futures/process.py", line 181, in _process_worker result=r)) File "/software/python/Python-3.6.1/lib/python3.6/multiprocessing/queues.py", line 355, in put self._writer.send_bytes(obj) File "/software/python/Python-3.6.1/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/software/python/Python-3.6.1/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 Traceback (most recent call last): File "/software/python/Python-3.6.1/bin/NanoPlot", line 10, in sys.exit(main()) File "/software/python/Python-3.6.1/lib/python3.6/site-packages/nanoplot/NanoPlot.py", line 63, in main keep_supp=not(args.no_supplementary)) File "/software/python/Python-3.6.1/lib/python3.6/site-packages/nanoget/nanoget.py", line 83, in get_input keep_supp=keep_supp) File "/software/python/Python-3.6.1/lib/python3.6/site-packages/nanoget/extraction_functions.py", line 166, in process_bam repeat(bam), unit, repeat(kwargs["keep_supp"])) File "/software/python/Python-3.6.1/lib/python3.6/site-packages/nanoget/extraction_functions.py", line 165, in data=[res for sublist in executor.map(extract_from_bam, File "/software/python/Python-3.6.1/lib/python3.6/concurrent/futures/_base.py", line 556, in result_iterator yield future.result() File "/software/python/Python-3.6.1/lib/python3.6/concurrent/futures/_base.py", line 405, in result return self.get_result() File "/software/python/Python-3.6.1/lib/python3.6/concurrent/futures/_base.py", line 357, in get_result raise self._exception concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

ty

wdecoster commented 3 years ago

Thanks for your patience, I think I have solved this in nanoget v1.16, which is now available through PyPI, and probably soon on conda. Could you test again?

I am planning a more thorough overhaul of the parsing functions and migration to faster alternatives, but that will take longer.