statgen / pheweb

A tool to build a website to browse hundreds or thousands of GWAS.
MIT License
154 stars 65 forks source link

How to estimate time for `sites && pheweb make-gene-aliases-sqlite3` step? #177

Closed Shicheng-Guo closed 1 year ago

Shicheng-Guo commented 2 years ago

How to estimate the required time for the step below? (I have 1108 summary statistics and each file have 26 million SNPs)

pheweb sites && pheweb make-gene-aliases-sqlite3 && pheweb add-rsids && pheweb add-genes && pheweb make-cpras-rsids-sqlite3

Thanks Shicheng

pjvandehaar commented 2 years ago

Could you send a screenshot of your terminal? What's it show now? PheWeb usually tries to print its progress.

Shicheng-Guo commented 2 years ago

Nothing show up in the terminal. I use slurm job to submit this step as below:

#!/bin/bash
#SBATCH -p shared
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=sguo2@its.jnj.com
#SBATCH --nodes 1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --qos=shared-oneweek
#SBATCH --time 7-00:00:00
pheweb sites && pheweb make-gene-aliases-sqlite3 && pheweb add-rsids && pheweb add-genes && pheweb make-cpras-rsids-sqlite3

I notice generated-by-pheweb/tmp have lots of binary files (N>96, tmp-merging-xxxxxx) are generated. this step has been running for 16 hours and I am wondering how long time it will take since our max slurm job running time is 7 days.

**cd /home/sguo2/janssen4/pheweb2/generated-by-pheweb/tmp**
drwxr-xr-x 2 sguo2 jan100 554K Sep  2 13:01 parsed
drwxr-xr-x 2 sguo2 jan100    0 Sep  2 13:14 sites
drwxr-xr-x 6 sguo2 jan100 2.0K Sep  2 13:14 .
drwxr-xr-x 2 sguo2 jan100 1.0K Sep  2 18:27 phenolist-backups
drwxr-xr-x 5 sguo2 jan100 7.0K Sep  2 18:28 ..
drwxr-xr-x 3 sguo2 jan100  74K Sep  2 18:28 tmp
(base) [sguo2@login02 generated-by-pheweb]$ cd tmp/
(base) [sguo2@login02 tmp]$ ll | tail
-rw-r--r-- 1 sguo2 jan100  62M Sep  3 11:07 tmp-merging-2279316710
-rw-r--r-- 1 sguo2 jan100 152M Sep  3 11:07 tmp-merging-8929684999
-rw-r--r-- 1 sguo2 jan100 166M Sep  3 11:07 tmp-merging-263313817
-rw-r--r-- 1 sguo2 jan100  72M Sep  3 11:07 tmp-merging-688921516
-rw-r--r-- 1 sguo2 jan100  71M Sep  3 11:07 tmp-merging-3823904954
-rw-r--r-- 1 sguo2 jan100  63M Sep  3 11:07 tmp-merging-8919461510
-rw-r--r-- 1 sguo2 jan100  63M Sep  3 11:08 tmp-merging-302413105
-rw-r--r-- 1 sguo2 jan100  63M Sep  3 11:08 tmp-merging-8950823609
-rw-r--r-- 1 sguo2 jan100  63M Sep  3 11:08 tmp-merging-3598860362
-rw-r--r-- 1 sguo2 jan100  62M Sep  3 11:08 tmp-merging-3634959663
pjvandehaar commented 2 years ago

Look at pheweb’s STDOUT. Maybe slurm puts it in some default location.

Did you tell pheweb how many cpus to use?

Shicheng-Guo commented 2 years ago

the setting is like below, I didn't find this command will take multiple CPU. Is there any suggestion to make it faster. I notice it has been running for 4 days without any interesting result come out except lot of tmp-merge-xxx files.

#SBATCH --nodes 1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
pheweb sites && pheweb make-gene-aliases-sqlite3 && pheweb add-rsids && pheweb add-genes && pheweb make-cpras-rsids-sqlite3
pjvandehaar commented 2 years ago

Look at pheweb's STDOUT.