kr-colab / diploSHIC

feature-based deep learning for the identification of selective sweeps
MIT License
50 stars 14 forks source link

Example work flow required huge amount of RAM #34

Closed molpopgen closed 3 years ago

molpopgen commented 3 years ago

The work flow in examples crashes on work stations with 64GB of RAM. This makes it hard to get started w/o using a cluster or whatnot. Reducing the Ne to 1e5 helps a lot.

/usr/bin/time -f "%e %M" snakemake -j 1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    649 calc_sim_fvs
    1   concat_fvecs
    216 discoal_hard_simulation
    216 discoal_neutral_simulation
    217 discoal_soft_simulation
    1   make_training_sets
    1   train_classifier
    1302

[Mon Feb  8 15:04:49 2021]
Job 662: hard simulation stage

Job counts:
    count   jobs
    1   discoal_hard_simulation
    1
[Mon Feb  8 15:07:12 2021]
Finished job 662.
1 of 1302 steps (0.08%) done

[Mon Feb  8 15:07:12 2021]
rule calc_sim_fvs:
    input: test/discoal.hard.0.0.out
    output: test/discoal.hard.0.0.out.fvec
    jobid: 661
    wildcards: tDir=test, mod=hard, x=0, i=0

Job counts:
    count   jobs
    1   calc_sim_fvs
    1
/usr/bin/python3 /home/kevin/src/diploSHIC/makeFeatureVecsForSingleMs_ogSHIC.py test/discoal.hard.0.0.out 55000 11 none None none 0.25 0.0 None test/discoal.hard.0.0.out.fvec
file name='test/discoal.hard.0.0.out'maskFileName='none': not doing any masking!
[Mon Feb  8 15:07:14 2021]
Finished job 661.
2 of 1302 steps (0.15%) done

[Mon Feb  8 15:07:14 2021]
Job 542: hard simulation stage

Job counts:
    count   jobs
    1   discoal_hard_simulation
    1
    /usr/bin/bash: line 1: 326865 Killed                  /home/kevin/src/discoal/discoal 10 10 55000 -Pt 130.79999999999998 1307.9999999999998 -Pre 3597.0 10791.0 -ws 0 -Pa 200.000000 10000.000000 -Pu 0 0.050000 -x 0.5 > train/discoal.hard.5.0.out
[Mon Feb  8 15:08:37 2021]
Error in rule discoal_hard_simulation:
    jobid: 0
    output: train/discoal.hard.5.0.out

RuleException:
CalledProcessError in line 87 of /home/kevin/src/diploSHIC/examples/Snakefile:
Command 'set -euo pipefail;  /home/kevin/src/discoal/discoal 10 10 55000 -Pt 130.79999999999998 1307.9999999999998 -Pre 3597.0 10791.0  -ws 0 -Pa 200.000000 10000.000000 -Pu 0 0.050000 -x 0.5 > train/discoal.hard.5.0.out' returned non-zero exit status 137.
  File "/home/kevin/.local/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2319, in run_wrapper
  File "/home/kevin/src/diploSHIC/examples/Snakefile", line 87, in __rule_discoal_hard_simulation
  File "/home/kevin/.local/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 568, in _callback
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
  File "/home/kevin/.local/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
  File "/home/kevin/.local/lib/python3.8/site-packages/snakemake/executors/__init__.py", line 2350, in run_wrapper
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/kevin/src/diploSHIC/examples/.snakemake/log/2021-02-08T150448.609373.snakemake.log
Command exited with non-zero status 1
230.32 62912924
andrewkern commented 3 years ago

moving these to msprime soon is gonna help even more!

molpopgen commented 3 years ago

moving these to msprime soon is gonna help even more!

But what about in the meantime? Msprime 1.0 is still a little ways off, and this repo will need the packaging reworked once it is out, etc..

andrewkern commented 3 years ago

sure. i can set it to ne=1e5 for the example workflow.