opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

do LD expansion for disease top loci using spark #2150

Closed elipapa closed 6 years ago

elipapa commented 6 years ago

probably using a spark job against https://bigquery.cloud.google.com/table/genomics-public-data:linkage_disequilibrium_1000G_phase_3.super_pop_EUR?tab=preview

for now we will use the postgap table and load it into clickhouse

edm1 commented 6 years ago

I've finished the new analysis, which calculates LD in the study-specific population (using mapping here). If the GWAS is from a mixture of populations, it Fisher Z-transforms the correlation coefficients and takes an average weighted by population sample size.

Here are a few stats comparing my (custom) analysis to the postgap ld:

Number of unique studies
 custom: 5376
 postgap: 4716
Number of unique index variants
 custom: 59546
 postgap: 38197
Number of unique tag variants
 custom: 1107234
 postgap: 765918
Number of tag variants per index
 custom: 18.59459913344305
 postgap: 20.051784171531796

The output is in staging gs://genetics-portal-staging/v2d/180913/ld.tsv.gz

Code: https://github.com/opentargets/v2d_data

edm1 commented 6 years ago

LD is now calculated in the GWAS-specific population: https://github.com/opentargets/v2d_data/blob/master/2_calculate_LD_table.Snakefile

I still need to write documentation for this.