opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Recover EFO mappings for FINNGEN studies #3280

Open xyg123 opened 2 months ago

xyg123 commented 2 months ago

Background

The most recent release (24.03) of the study index does not contain EFO mappings for 2,408 studies from FINNGEN. Currently, reading the entire study index with the following command results in no EFO column:

study_index=session.spark.read.parquet(study_index_path, recursiveFileLookup=True) study_index.printSchema()

root |-- studyId: string (nullable = true) |-- projectId: string (nullable = true) |-- studyType: string (nullable = true) |-- traitFromSource: string (nullable = true) *This is just the trait name i.e. Depressed affect, mood disorder |-- geneId: string (nullable = true) |-- tissueFromSourceId: string (nullable = true) |-- nSamples: integer (nullable = true) |-- summarystatsLocation: string (nullable = true) |-- hasSumstats: boolean (nullable = true)

This is despite having 79,861 GWAS catalog studies (out of 79,872) WITH EFO mappings in the study index.

We have manually curated EFO mappings for 2,841 FINNGEN studies from the old genetics pipeline.

Changing the studyIds from FINNGENR6... to FINNGENR10... allows a direct recovery of 1,858 (~75%) of EFO mappings for this release.

Tasks