statgen / pheweb

A tool to build a website to browse hundreds or thousands of GWAS.
MIT License
154 stars 65 forks source link

matrix not being updated with phenotypes #159

Closed jw-insitro closed 3 years ago

jw-insitro commented 3 years ago

This may be due to user error, but in my installation the generated matrix file fails to include my phenotype information. This is most easily demonstrated by running pheweb matrix twice in succession:

[ec2-user@ip-172-31-145-205 pheweb-site]$ pheweb matrix
re-running because cur matrix has wrong phenos.
- phenos in pheno-list.json but not matrix.tsv.gz: 'Vitronectin-CSF-SomaLogic', 'Siglec-9-CSF-SomaLogic', 'AD-Risk1', 'sICAM-1-CSF-SomaLogic', 'sLeptin-R-CSF-SomaLogic'
- phenos in matrix.tsv.gz but not pheno-list.json:

[ec2-user@ip-172-31-145-205 pheweb-site]$ pheweb matrix
re-running because cur matrix has wrong phenos.
- phenos in pheno-list.json but not matrix.tsv.gz: 'sLeptin-R-CSF-SomaLogic', 'AD-Risk1', 'sICAM-1-CSF-SomaLogic', 'Siglec-9-CSF-SomaLogic', 'Vitronectin-CSF-SomaLogic'
- phenos in matrix.tsv.gz but not pheno-list.json:

Having delved into the code, here are snippets from the sites/sites.tsv, phenos_gz/* and matrix.tsv.gz files:

sites.tsv

chrom   pos     ref     alt     rsids   nearest_genes
1       662622  G       A       rs61769339      OR4F16
1       693625  T       C       rs190214723     OR4F16
1       693731  A       G       rs12238997      OR4F16
1       705882  G       A       rs72631875      OR4F16

pheno_gz/AD-Risk1.gz

chrom   pos     ref     alt     rsids   nearest_genes   pval    beta
1       662622  G       A       rs61769339      OR4F16  0.029   0.1
1       693625  T       C       rs190214723     OR4F16  0.81    -0.016
1       693731  A       G       rs12238997      OR4F16  0.016   0.1
1       705882  G       A       rs72631875      OR4F16  0.76    0.018

matrix.tsv.gz

#chrom  pos     ref     alt     rsids   nearest_genes
1       662622  G       A       rs61769339      OR4F16
1       693625  T       C       rs190214723     OR4F16
1       693731  A       G       rs12238997      OR4F16
1       705882  G       A       rs72631875      OR4F16

I confirmed that the number of phenotypes read by the glob function in the c++ code to write the matrix does capture all of the files in pheno_gz/* but did not explore any further.

Any suggestions or advice? Or is this a PEBKAC error?

pjvandehaar commented 3 years ago

Are you on the latest version? Run pheweb -v. Versions 1.3.0 - 1.3.3 had problems in that code.

Haha, I'm always a fan of dismissing errors as PEBKAC but this looks like pheweb's fault.

jw-insitro commented 3 years ago

Thanks for the quick reply! I initially installed via pip3 install pheweb but apparently resulted in installing 1.2.0.

Re-installing via pip3 install 'pheweb==1.3.5' has me on track. Perhaps this is something worth noting in the README? :smile:

pjvandehaar commented 3 years ago

PheWeb 1.2.0 was just released two weeks ago, so if you installed then that'd be why.

I just added a suggestion to run pip3 install --upgrade pheweb in the new issue template. Thanks for the suggestion. 👍

If you have any more suggestions about changes to make to pheweb or features to add next, I'd enjoy hearing about them at pjvh@umich.edu.