Closed elipapa closed 6 years ago
I've finished the new analysis, which calculates LD in the study-specific population (using mapping here). If the GWAS is from a mixture of populations, it Fisher Z-transforms the correlation coefficients and takes an average weighted by population sample size.
Here are a few stats comparing my (custom) analysis to the postgap ld:
Number of unique studies
custom: 5376
postgap: 4716
Number of unique index variants
custom: 59546
postgap: 38197
Number of unique tag variants
custom: 1107234
postgap: 765918
Number of tag variants per index
custom: 18.59459913344305
postgap: 20.051784171531796
The output is in staging gs://genetics-portal-staging/v2d/180913/ld.tsv.gz
LD is now calculated in the GWAS-specific population: https://github.com/opentargets/v2d_data/blob/master/2_calculate_LD_table.Snakefile
I still need to write documentation for this.
probably using a spark job against https://bigquery.cloud.google.com/table/genomics-public-data:linkage_disequilibrium_1000G_phase_3.super_pop_EUR?tab=preview
for now we will use the postgap table and load it into clickhouse