statgen / locuszoom

A Javascript/d3 embeddable plugin for interactively visualizing statistical genetic data from customizable sources.
https://statgen.github.io/locuszoom/
MIT License
154 stars 29 forks source link

Help for user data #105

Closed xtmgah closed 6 years ago

xtmgah commented 7 years ago

Hello:

Is there any way to use the custom GWAS/eQTL data in locuszoom.js ? And how to add it as the datasource (data are txt file and without api support)? Thanks.

xtmgah commented 7 years ago

Also, Try to list all available reference panels using api, but the linking is not working?? Thanks.

curl "http://portaldev.sph.umich.edu/api/v1/statistic/pair/LD/"

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

404 Not Found

Not Found

The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.

pjvandehaar commented 7 years ago

That's from http://portaldev.sph.umich.edu/docs/api/v1/#linkage-disequilibrium? I think this issue is the the same as https://github.com/statgen/locuszoom-api/issues/2. @welchr @dtaliun ?

Frencil commented 7 years ago

@xtmgah to your original question about using a locally-sourced .txt file, see the latest comment on issue #38. It identifies a way this is currently being demonstrated in the repo.

One caveat though is that the files being loaded are json, not txt. If you can meet LocusZoom.js halfway by refactoring your data to be a valid json file then you can load pretty much anything as a datasource. See the static data files in the repo at that link for examples of how data sets like association and LD can be represented with current built-in datasources.

welchr commented 7 years ago

It is indeed statgen/locuszoom-api#2. Daniel or I will implement it eventually. Right now we use hard coded sources:

REFERENCE PANEL POPULATION BUILD VERSION
1 1000G ALL GRCh37 Phase 3 v5a
2 1000G EUR GRCh37 Phase 3 v5a

For example, to retrieve LD within a region for 1000G ALL (reference 1) with index SNP 16:53819169_T/C:

http://portaldev.sph.umich.edu/api/v1/pair/LD/results/?filter=reference eq 1 and chromosome2 eq '16' and position2 ge 53519169 and position2 le 54119169 and variant1 eq '16:53819169_T/C'
pjvandehaar commented 7 years ago

Documentation-driven development, I see.

xtmgah commented 7 years ago

@welchr Yes. I use the following command, and it work.

curl -G "http://portaldev.sph.umich.edu/api/v1/statistic/pair/LD/results/" --data-urlencode "filter=reference eq 1 and chromosome2 eq '16' and position2 ge 53519169 and position2 le 54119169 and variant1 eq '16:53819169_T/C'"

xtmgah commented 7 years ago

@Frencil I try to use the demo offline datasource in index.html file, but look like it's not working... I set the online = false, and it should link to the staticdata folder, but nothing happen when i open the html file (it's just blank in the locuszoom region). So, did you test it before?

xtmgah commented 7 years ago

@Frencil BTW, is there any simple script to convert the text file to json? for example, the GWAS/eQTL p-value data format

Frencil commented 7 years ago

@xtmgah Pull down the latest version of master for this repo... there was a bug in the staticdata example in the repo's index.html that I fixed last night. You'll want the value of offline to be something truthy (e.g. index.html?offline=1) to force offline mode.

As for how to convert your data to a consumable format, @pjvh may know of something specific but a lot of it depends on what you're starting with. Does your data conform to a standard that's documented somewhere (if so, can you link to the format documentation)? LocusZoom.js is designed to be pretty generic. While the included data sources and layouts favor specific formats the framework is designed to be extensible so that new data sources and layouts for new types/formats of data can be added with relative ease. These parts of the documentation may help:

xtmgah commented 7 years ago

@Frencil Thanks. Now, the offline is working... my data is not document somewhere, just have some basic column, for example, SNP ID, P-value, and few other columns such as effect size etc (very similar to the assoc_10_114550452-115067678 in staticdata folder . It think it's super easy to convert. But i hope @pjvandehaar can give me some suggestion. Thanks a lot..

pjvandehaar commented 7 years ago

Yes, I can help with that. Could you paste the first ~10 rows of each of your files here, so I can see the formats?

xtmgah commented 7 years ago

@pjvandehaar similar to the following one:

gene_id variant_id  tss_distance    pval_nominal    slope   slope_se    pval_nominal_threshold
cg17149495  1:798400    267441  7.30894e-08 -0.112488   0.0189413   1.00921e-05
cg17149495  1:798959    268000  7.31282e-08 -0.111954   0.0188518   1.00921e-05
cg02288058  1:798400    232797  3.22535e-05 -0.0551874  0.0125136   3.61876e-05
cg02288058  1:798959    233356  3.25037e-05 -0.0549054  0.0124555   3.61876e-05
cg00034556  1:798400    231669  1.3333e-06  -0.0891881  0.01704 9.95279e-06
cg00034556  1:798959    232228  1.30662e-06 -0.0888293  0.0169552   9.95279e-06
cg15394630  1:798400    231194  7.55636e-07 -0.0366296  0.00681527  1.13151e-05
cg15394630  1:798959    231753  7.42208e-07 -0.0364781  0.00678152  1.13151e-05
cg23917638  1:798400    230899  1.26803e-09 -0.0743637  0.0108091   1.25392e-05
cg23917638  1:798959    231458  1.1988e-09  -0.0740977  0.0107505   1.25392e-05
cg18761878  1:798400    229925  6.4307e-06  -0.0381297  0.00788534  9.19423e-06
cg18761878  1:798959    230484  6.41223e-06 -0.0379537  0.00784774  9.19423e-06
cg08858441  1:798400    228973  1.56258e-12 -0.108516   0.0129452   7.97414e-06

Thanks.

xtmgah commented 7 years ago

This may be easy to check for each column:

$1  gene_id                 cg17149495
$2  variant_id              1:798400
$3  tss_distance            267441
$4  pval_nominal            7.30894e-08
$5  slope                   -0.112488
$6  slope_se                0.0189413
$7  pval_nominal_threshold  1.00921e-05
dtaliun commented 7 years ago

@xtmgah Do you have alleles for variants?

xtmgah commented 7 years ago

@pjvandehaar yes. I can easy add in. But Is that necessary? Do you have any script or document? so, i can check to convert our data to json.

pjvandehaar commented 7 years ago

What is pval_nominal_threshold?

xtmgah commented 7 years ago

@pjvandehaar you can omit this column. It's only addition information (cut off used to identify the genome-wide significant p) from our eQTL project. So, You can use the pval_nominal as GWAS p.. Thank.

pjvandehaar commented 7 years ago

Do you have just one trait, or lots of traits? How many variants?

If it's just a few traits, and <1M variants each, this'll work:

s = '''gene_id  variant_id  tss_distance    pval_nominal    slope   slope_se    pval_nominal_threshold
cg17149495  1:798400    267441  7.30894e-08 -0.112488   0.0189413   1.00921e-05
cg17149495  1:798959    268000  7.31282e-08 -0.111954   0.0188518   1.00921e-05
cg02288058  1:798400    232797  3.22535e-05 -0.0551874  0.0125136   3.61876e-05
cg02288058  1:798959    233356  3.25037e-05 -0.0549054  0.0124555   3.61876e-05
cg00034556  1:798400    231669  1.3333e-06  -0.0891881  0.01704 9.95279e-06
cg00034556  1:798959    232228  1.30662e-06 -0.0888293  0.0169552   9.95279e-06
cg15394630  1:798400    231194  7.55636e-07 -0.0366296  0.00681527  1.13151e-05
cg15394630  1:798959    231753  7.42208e-07 -0.0364781  0.00678152  1.13151e-05
cg23917638  1:798400    230899  1.26803e-09 -0.0743637  0.0108091   1.25392e-05
cg23917638  1:798959    231458  1.1988e-09  -0.0740977  0.0107505   1.25392e-05
cg18761878  1:798400    229925  6.4307e-06  -0.0381297  0.00788534  9.19423e-06
cg18761878  1:798959    230484  6.41223e-06 -0.0379537  0.00784774  9.19423e-06
cg08858441  1:798400    228973  1.56258e-12 -0.108516   0.0129452   7.97414e-06
'''
import csv, json
ds = list(csv.DictReader(s.split('\n'), delimiter='\t'))
df = {
  'pvalue': [float(d['pval_nominal']) for d in ds],
  'chrom': [d['variant_id'].split(':')[0] for d in ds],
  'position': [int(d['variant_id'].split(':')[1]) for d in ds],
  'id': [d['variant_id'] for d in ds],
}
df

That produces:

{
"pvalue": [7.30894e-08, 7.31282e-08, 3.22535e-05, 3.25037e-05, 1.3333e-06, 1.30662e-06, 7.55636e-07, 7.42208e-07, 1.26803e-09, 1.1988e-09, 6.4307e-06, 6.41223e-06, 1.56258e-12], 
"chrom": ["1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"], 
"position": [798400, 798959, 798400, 798959, 798400, 798959, 798400, 798959, 798400, 798959, 798400, 798959, 798400],
"id": ["1:798400", ...]
}

That's close enough to what http://portaldev.sph.umich.edu/api/v1/single/results/?filter=analysis%20in%2052%20and%20chromosome%20in%20%20%2716%27%20and%20position%20ge%2053809169%20and%20position%20le%2053829169 produces to plug it into LocusZoom I think.

xtmgah commented 7 years ago

That's Great. i am not familiar with python. could you write few more line codes about the read the data in a txt file and out put to a json file. (so, i only need run the program and input my txt files). Thanks.

pjvandehaar commented 7 years ago

Do you have just one trait, or lots of traits? How many variants?

xtmgah commented 7 years ago

few trait (less than 10). the variants depend on the loci, but should be less than 1000.

pjvandehaar commented 7 years ago

Here's a program to do it:

#!/usr/bin/env python

import csv, json, sys

infilepath = sys.argv[1]
outfilepath = sys.argv[2]

print('Converting from {} to {}'.format(infilepath, outfilepath))

with open(infilepath) as f:
    objs = list(csv.DictReader(filter(None, f), delimiter='\t'))

df = {
  'pvalue': [float(d['pval_nominal']) for d in objs],
  'chrom': [d['variant_id'].split(':')[0] for d in objs],
  'position': [int(d['variant_id'].split(':')[1]) for d in objs],
  'id': [d['variant_id'] for d in objs],
}

with open(outfilepath, 'w') as f:
    json.dump(df, f)

Save that as a.py. Then run python a.py yourfile.txt newfile.json.

xtmgah commented 7 years ago

@pjvandehaar That's great. Thanks so much...