pheweb process for large datasets

Shicheng-Guo commented 4 years ago

Hi Peter,

Thanks for the solution on pheweb process to large datasets, However, I still don't know how to do that. I have a very large dataset including 7000 summary statistic files and each of them is about 500M. I want to run pheweb process in Slurm HPC platform. Can you give a explicit example to do pheweb process to large datasets.

This step can take hours or days for large datasets. If you want to use the SLURM cluster scheduler, run pheweb slurm-parse for parsing and then pheweb process --no-parse for everything else. To use a different cluster scheduler, modify the file written by pheweb slurm-parse to support your scheduler. Thanks.

Shicheng

pjvandehaar commented 4 years ago

Did you try running pheweb slurm-parse (and following the instructions it prints) and then running pheweb process --no-parse? It sounds like those instructions aren't clear, but I'm not sure how to write them better. If you were willing to rewrite that paragraph I'd appreciate it and update the README.

Shicheng-Guo commented 4 years ago

Yes. I tried pheweb slurm-parse, here is the error I received:

sbatch: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)

Details please see the following:

(base) [sguo2@comet-ln3 pheweb]$ pheweb slurm-parse
Run:
sbatch /projects/dsci-csb/user/sguo2/pheweb/generated-by-pheweb/tmp/slurm-parse-2020-07-19T11-17-35.451141.sh

Monitor with `squeue --long --array --job <jobid>`

output will be in /projects/dsci-csb/user/sguo2/pheweb/generated-by-pheweb/tmp//slurm-*.out

Then I run sbatch /projects/dsci-csb/user/sguo2/pheweb/generated-by-pheweb/tmp/slurm-parse-2020-07-19T11-17-35.451141.sh

(base) [sguo2@comet-ln3 pheweb]$ sbatch /projects/dsci-csb/user/sguo2/pheweb/generated-by-pheweb/tmp/slurm-parse-2020-07-19T11-17-35.451141.sh
sbatch: error: bank_limit plugin: The requested time can not exceed the available balance.
Requested SUs: 8179910
Allocation limit group SUs: 9626302
Allocation limit user SUs: 9626302
Allocation used group SUs: 5776586
Allocation used user SUs: 13293
Allocation available group SUs: 3849716
Allocation available user SUs: 9613009
Allocation running/queued group SUs: 0
Allocation running/queued user SUs: 0
Allocation completed today group SUs: 0
Allocation completed today user SUs: 0
sbatch: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)

pjvandehaar commented 4 years ago

The file /projects/dsci-csb/user/sguo2/pheweb/generated-by-pheweb/tmp/slurm-parse-2020-07-19T11-17-35.451141.sh should contain the line

#SBATCH --time=5-0:0

It sounds like 5 days doesn't work for your account. Edit that line or talk with your sysadmin.

statgen / pheweb

pheweb process for large datasets #139