opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Odds ratios are not harmonised in the GWAS Catalog curated association ingestion #3461

Closed DSuveges closed 2 months ago

DSuveges commented 2 months ago

As reported by Jake Fremier, in the PICSed, curated GWAS Catalog association pile, betas are not following the expected distribution: image

When looking into the issue, apparently betas are harmonised, however odds ratios are ignored. As the association effect is stored in the same column regardless if it was OR or beta, ORs just propagated as is. The required logic is already in the GWAS Catalog datasource code, just needs to be "turned on".

When it has happened, make sure sufficient testing is also added to the codebase.

DSuveges commented 2 months ago

The fix has been prototyped. The before data:

(
    spark.read.parquet('/Users/dsuveges/project_data/gentropy/credible_set/gwas_catalog_PICSed_curated_associations')
    .filter(
        (f.col("beta").isNotNull()) & 
        (f.col('beta') > -1.5) & 
        (f.col('beta') < 1.5)
    )
    .select('beta')
    .toPandas()
    .hist(bins=100)
)

image

Which becomes:

(
    beta_harmonised_df
    .filter(
        (f.col("beta").isNotNull()) & 
        (f.col('beta') > -1.5) & 
        (f.col('beta') < 1.5)
    )
    .select('beta')
    .toPandas()
    .hist(bins=100)
)

image

project-defiant commented 2 months ago

The one thing I am not getting from this is that the distribution reported initially looks swapped a bit different then the one you were @DSuveges able to reproduce.

jfreimer commented 2 months ago

@project-defiant the plots I initially sent were just from a subset of studies that we were interested in looking at rather than the full catalog which is probably why the distribution looks different.