heche-psb / wgd

wgd v2: a suite of tools to uncover and date ancient polyploidy and whole-genome duplication
https://wgdv2.readthedocs.io/en/latest/
GNU General Public License v3.0
21 stars 0 forks source link

wgd dmd - struct.error: 'i' format requires -2147483648 <= number <= 2147483647 #40

Open erika-r-moore opened 1 week ago

erika-r-moore commented 1 week ago

Hello!

I am trying to run WGD v2 on some transcriptome data. I have successfully run wgd dmd on each sample independently (e.g., wgd dmd Sample1.fasta -of). However, when I try to do pairwise (e.g., wgd dmd Sample1.fasta Sample2.fasta -of) I get this error with some samples:

Traceback (most recent call last):
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/bin/wgd", line 8, in <module>
    sys.exit(cli())
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/cli.py", line 117, in dmd
    _dmd(**kwargs)
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/cli.py", line 155, in _dmd
    Parallel(n_jobs=nthreads,backend='multiprocessing')(delayed(parallelrbh)(s,i,j,ogformat,cscore,eval) for i,j in pairs)
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/joblib/parallel.py", line 789, in __call__
    self.retrieve()
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/joblib/parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
    put(task)
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/site-packages/joblib/pool.py", line 372, in send
    self._writer.send_bytes(buffer.getvalue())
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/ermoore3/miniconda2/envs/mamba/envs/wgdv2/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

I believe this error is because the input files are too large. I believe this because when running wgd dmd for a single species (e.g., wgd dmd Sample1.fasta -of), the size of the resulting .tsv file for the failing samples are ~8x larger than the other samples that ran successfully.

Do you believe I am correct? If so, do you have any suggestions on how to fix this?

Any help is appreciated!

Best, Erika