Closed xtmgah closed 6 years ago
Also, Try to list all available reference panels using api, but the linking is not working?? Thanks.
curl "http://portaldev.sph.umich.edu/api/v1/statistic/pair/LD/"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.
That's from http://portaldev.sph.umich.edu/docs/api/v1/#linkage-disequilibrium? I think this issue is the the same as https://github.com/statgen/locuszoom-api/issues/2. @welchr @dtaliun ?
@xtmgah to your original question about using a locally-sourced .txt file, see the latest comment on issue #38. It identifies a way this is currently being demonstrated in the repo.
One caveat though is that the files being loaded are json, not txt. If you can meet LocusZoom.js halfway by refactoring your data to be a valid json file then you can load pretty much anything as a datasource. See the static data files in the repo at that link for examples of how data sets like association and LD can be represented with current built-in datasources.
It is indeed statgen/locuszoom-api#2. Daniel or I will implement it eventually. Right now we use hard coded sources:
REFERENCE | PANEL | POPULATION | BUILD | VERSION |
---|---|---|---|---|
1 | 1000G | ALL | GRCh37 | Phase 3 v5a |
2 | 1000G | EUR | GRCh37 | Phase 3 v5a |
For example, to retrieve LD within a region for 1000G ALL (reference 1) with index SNP 16:53819169_T/C:
http://portaldev.sph.umich.edu/api/v1/pair/LD/results/?filter=reference eq 1 and chromosome2 eq '16' and position2 ge 53519169 and position2 le 54119169 and variant1 eq '16:53819169_T/C'
Documentation-driven development, I see.
@welchr Yes. I use the following command, and it work.
curl -G "http://portaldev.sph.umich.edu/api/v1/statistic/pair/LD/results/" --data-urlencode "filter=reference eq 1 and chromosome2 eq '16' and position2 ge 53519169 and position2 le 54119169 and variant1 eq '16:53819169_T/C'"
@Frencil I try to use the demo offline datasource in index.html file, but look like it's not working... I set the online = false, and it should link to the staticdata folder, but nothing happen when i open the html file (it's just blank in the locuszoom region). So, did you test it before?
@Frencil BTW, is there any simple script to convert the text file to json? for example, the GWAS/eQTL p-value data format
@xtmgah Pull down the latest version of master for this repo... there was a bug in the staticdata example in the repo's index.html that I fixed last night. You'll want the value of offline
to be something truthy (e.g. index.html?offline=1
) to force offline mode.
As for how to convert your data to a consumable format, @pjvh may know of something specific but a lot of it depends on what you're starting with. Does your data conform to a standard that's documented somewhere (if so, can you link to the format documentation)? LocusZoom.js is designed to be pretty generic. While the included data sources and layouts favor specific formats the framework is designed to be extensible so that new data sources and layouts for new types/formats of data can be added with relative ease. These parts of the documentation may help:
@Frencil Thanks. Now, the offline is working... my data is not document somewhere, just have some basic column, for example, SNP ID, P-value, and few other columns such as effect size etc (very similar to the assoc_10_114550452-115067678 in staticdata folder . It think it's super easy to convert. But i hope @pjvandehaar can give me some suggestion. Thanks a lot..
Yes, I can help with that. Could you paste the first ~10 rows of each of your files here, so I can see the formats?
@pjvandehaar similar to the following one:
gene_id variant_id tss_distance pval_nominal slope slope_se pval_nominal_threshold
cg17149495 1:798400 267441 7.30894e-08 -0.112488 0.0189413 1.00921e-05
cg17149495 1:798959 268000 7.31282e-08 -0.111954 0.0188518 1.00921e-05
cg02288058 1:798400 232797 3.22535e-05 -0.0551874 0.0125136 3.61876e-05
cg02288058 1:798959 233356 3.25037e-05 -0.0549054 0.0124555 3.61876e-05
cg00034556 1:798400 231669 1.3333e-06 -0.0891881 0.01704 9.95279e-06
cg00034556 1:798959 232228 1.30662e-06 -0.0888293 0.0169552 9.95279e-06
cg15394630 1:798400 231194 7.55636e-07 -0.0366296 0.00681527 1.13151e-05
cg15394630 1:798959 231753 7.42208e-07 -0.0364781 0.00678152 1.13151e-05
cg23917638 1:798400 230899 1.26803e-09 -0.0743637 0.0108091 1.25392e-05
cg23917638 1:798959 231458 1.1988e-09 -0.0740977 0.0107505 1.25392e-05
cg18761878 1:798400 229925 6.4307e-06 -0.0381297 0.00788534 9.19423e-06
cg18761878 1:798959 230484 6.41223e-06 -0.0379537 0.00784774 9.19423e-06
cg08858441 1:798400 228973 1.56258e-12 -0.108516 0.0129452 7.97414e-06
Thanks.
This may be easy to check for each column:
$1 gene_id cg17149495
$2 variant_id 1:798400
$3 tss_distance 267441
$4 pval_nominal 7.30894e-08
$5 slope -0.112488
$6 slope_se 0.0189413
$7 pval_nominal_threshold 1.00921e-05
@xtmgah Do you have alleles for variants?
@pjvandehaar yes. I can easy add in. But Is that necessary? Do you have any script or document? so, i can check to convert our data to json.
What is pval_nominal_threshold
?
@pjvandehaar you can omit this column. It's only addition information (cut off used to identify the genome-wide significant p) from our eQTL project. So, You can use the pval_nominal as GWAS p.. Thank.
Do you have just one trait, or lots of traits? How many variants?
If it's just a few traits, and <1M variants each, this'll work:
s = '''gene_id variant_id tss_distance pval_nominal slope slope_se pval_nominal_threshold
cg17149495 1:798400 267441 7.30894e-08 -0.112488 0.0189413 1.00921e-05
cg17149495 1:798959 268000 7.31282e-08 -0.111954 0.0188518 1.00921e-05
cg02288058 1:798400 232797 3.22535e-05 -0.0551874 0.0125136 3.61876e-05
cg02288058 1:798959 233356 3.25037e-05 -0.0549054 0.0124555 3.61876e-05
cg00034556 1:798400 231669 1.3333e-06 -0.0891881 0.01704 9.95279e-06
cg00034556 1:798959 232228 1.30662e-06 -0.0888293 0.0169552 9.95279e-06
cg15394630 1:798400 231194 7.55636e-07 -0.0366296 0.00681527 1.13151e-05
cg15394630 1:798959 231753 7.42208e-07 -0.0364781 0.00678152 1.13151e-05
cg23917638 1:798400 230899 1.26803e-09 -0.0743637 0.0108091 1.25392e-05
cg23917638 1:798959 231458 1.1988e-09 -0.0740977 0.0107505 1.25392e-05
cg18761878 1:798400 229925 6.4307e-06 -0.0381297 0.00788534 9.19423e-06
cg18761878 1:798959 230484 6.41223e-06 -0.0379537 0.00784774 9.19423e-06
cg08858441 1:798400 228973 1.56258e-12 -0.108516 0.0129452 7.97414e-06
'''
import csv, json
ds = list(csv.DictReader(s.split('\n'), delimiter='\t'))
df = {
'pvalue': [float(d['pval_nominal']) for d in ds],
'chrom': [d['variant_id'].split(':')[0] for d in ds],
'position': [int(d['variant_id'].split(':')[1]) for d in ds],
'id': [d['variant_id'] for d in ds],
}
df
That produces:
{
"pvalue": [7.30894e-08, 7.31282e-08, 3.22535e-05, 3.25037e-05, 1.3333e-06, 1.30662e-06, 7.55636e-07, 7.42208e-07, 1.26803e-09, 1.1988e-09, 6.4307e-06, 6.41223e-06, 1.56258e-12],
"chrom": ["1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"],
"position": [798400, 798959, 798400, 798959, 798400, 798959, 798400, 798959, 798400, 798959, 798400, 798959, 798400],
"id": ["1:798400", ...]
}
That's close enough to what http://portaldev.sph.umich.edu/api/v1/single/results/?filter=analysis%20in%2052%20and%20chromosome%20in%20%20%2716%27%20and%20position%20ge%2053809169%20and%20position%20le%2053829169 produces to plug it into LocusZoom I think.
That's Great. i am not familiar with python. could you write few more line codes about the read the data in a txt file and out put to a json file. (so, i only need run the program and input my txt files). Thanks.
Do you have just one trait, or lots of traits? How many variants?
few trait (less than 10). the variants depend on the loci, but should be less than 1000.
Here's a program to do it:
#!/usr/bin/env python
import csv, json, sys
infilepath = sys.argv[1]
outfilepath = sys.argv[2]
print('Converting from {} to {}'.format(infilepath, outfilepath))
with open(infilepath) as f:
objs = list(csv.DictReader(filter(None, f), delimiter='\t'))
df = {
'pvalue': [float(d['pval_nominal']) for d in objs],
'chrom': [d['variant_id'].split(':')[0] for d in objs],
'position': [int(d['variant_id'].split(':')[1]) for d in objs],
'id': [d['variant_id'] for d in objs],
}
with open(outfilepath, 'w') as f:
json.dump(df, f)
Save that as a.py
. Then run python a.py yourfile.txt newfile.json
.
@pjvandehaar That's great. Thanks so much...
Hello:
Is there any way to use the custom GWAS/eQTL data in locuszoom.js ? And how to add it as the datasource (data are txt file and without api support)? Thanks.