statgen / pheweb

A tool to build a website to browse hundreds or thousands of GWAS.
MIT License
159 stars 67 forks source link

Allow searching by gene #22

Closed sgagliano closed 7 years ago

pjvandehaar commented 7 years ago

Later: show best phenos in gene tooltip

Maybe when people hover over a gene, the other phenotypes should go in the tooltip. How do I make that happen? I'm sure LocusZoom's chain mechanism can support it, but I don't understand that.

pjvandehaar commented 7 years ago

Choosing a region for each gene

When a user searches for a gene, what region should be shown? Just the gene plus max(200bp, 5%) up and downstream? Sarah says that the ideal region varies by gene. If there's a peak near a gene, it might be good to extend the gene's region to capture that peak.

In that case, I would start with a wide region around a gene. Then I would get the (best_hit_pval, best_hit_pos) for each phenotype. Then I would narrow the region to just barely contain both the gene and all best_hit_pos where p<1e-6.

pjvandehaar commented 7 years ago

Getting gene aliases

dnSNFP gets their gene aliases from http://www.genenames.org/ . I want the columns [symbol, prev_symbol, alias_symbol]. I might also want ensembl_gene_id.

The full dataset (in json) is at:

From the page http://www.genenames.org/cgi-bin/statistics I can select many which type of gene I want. TODO: figure out which of those files (or maybe a few of them) I need. I can also build a custom download with only the columns I need, like http://www.genenames.org/cgi-bin/download?col=gd_app_sym&col=gd_prev_sym&col=gd_aliases&col=gd_pub_ensembl_id&status=Approved&status=Entry+Withdrawn&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&hgnc_dbtag=on&submit=submit.

Let's not do any gene subsetting. Just use the table from that long URL, and merge it into our gencode genes as aliases.

pjvandehaar commented 7 years ago

Number of phenotypes to show

Matthew recommends showing 2-9 phenotypes, depending on their significance level, and showing the p-values beside them (perhaps with a sparkline).