statgen / pheweb

A tool to build a website to browse hundreds or thousands of GWAS.
MIT License
154 stars 65 forks source link

pheweb sites for `slurm cluster` #178

Closed Shicheng-Guo closed 1 year ago

Shicheng-Guo commented 2 years ago

We can also try to move this step into slurm cluster to make this faster.

(base) [sguo2@login01 ~]$  pheweb sites -h
1. Extract all variants from each phenotype
2. Union them.
3. Write to /home/sguo2/generated-by-pheweb/sites/sites-unannotated.tsv

Usage:
  -h   print this message
  -f   run even if sites-unannotated.tsv is up-to-date
pjvandehaar commented 2 years ago

I haven't thought of any good way to do it. Do you have a suggestion?

Shicheng-Guo commented 2 years ago

Okay. I find a good solution for my own case. All my summary statistics is from UKB-GWS dataset, therefore, I can create my own sites-unannotated.tsv, rather than generated by pheweb sites since pheweb sites will loop 2000+ summary statistics (WGS-GWAS) files which takes quite long time.

I am wondering what will happen if my own sites-unannotated.tsv having some extra SNPs since REGENIE will ignore/remove some SNPs if MAF is lower than threshold.

Shicheng

Shicheng-Guo commented 2 years ago

Interesting. looks like all the files in /generated-by-pheweb/sites are binary files which is not in old pheweb version.

-rw-r--r-- 1 sguo2 jan100 104K Sep  6 21:39 cpras-rsids.sqlite3
-rw-r--r-- 1 sguo2 jan100  18K Sep  6 21:39 sites-rsids.tsv
-rw-r--r-- 1 sguo2 jan100  13K Sep  6 21:39 sites-unannotated.tsv
-rw-r--r-- 1 sguo2 jan100  19K Sep  6 21:39 sites.tsv

In some old pheweb version, sites-unannotated.tsv is txt file which I think we can prepare by our own script.