Closed sgagliano closed 7 years ago
When a user searches for a gene, what region should be shown? Just the gene plus max(200bp, 5%)
up and downstream? Sarah says that the ideal region varies by gene. If there's a peak near a gene, it might be good to extend the gene's region to capture that peak.
In that case, I would start with a wide region around a gene. Then I would get the (best_hit_pval, best_hit_pos) for each phenotype. Then I would narrow the region to just barely contain both the gene and all best_hit_pos where p<1e-6.
dnSNFP gets their gene aliases from http://www.genenames.org/ . I want the columns [symbol
, prev_symbol
, alias_symbol
]. I might also want ensembl_gene_id
.
The full dataset (in json) is at:
From the page http://www.genenames.org/cgi-bin/statistics I can select many which type of gene I want. TODO: figure out which of those files (or maybe a few of them) I need. I can also build a custom download with only the columns I need, like http://www.genenames.org/cgi-bin/download?col=gd_app_sym&col=gd_prev_sym&col=gd_aliases&col=gd_pub_ensembl_id&status=Approved&status=Entry+Withdrawn&status_opt=2&where=&order_by=gd_app_sym_sort&format=text&limit=&hgnc_dbtag=on&submit=submit.
Let's not do any gene subsetting. Just use the table from that long URL, and merge it into our gencode genes as aliases.
Matthew recommends showing 2-9 phenotypes, depending on their significance level, and showing the p-values beside them (perhaps with a sparkline).
Later: show best phenos in gene tooltip
Maybe when people hover over a gene, the other phenotypes should go in the tooltip. How do I make that happen? I'm sure LocusZoom's chain mechanism can support it, but I don't understand that.