opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

PPP 2024.08 data release intentions #3412

Closed project-defiant closed 1 month ago

project-defiant commented 2 months ago

As a developer I want describe what internal data release 2024.08 should cover

Background

Release intentions for 2024.08 genetics data release

Tasks

@addramir @tskir @Daniel-Considine @xyg123 FYI

project-defiant commented 2 months ago

Create manual curation spreadsheet

To generate new curation file:

  1. From gentropy run the GWAS_Catalog_update.sh - this will fetch the latest release metadata from GWAS Catalog as well as overwrite the curation with the latest one hosted in curation repo.
  2. Run make build from gentropy dev version - this will upload the gentropy package and config into the gcs bucket. These files will be synced to the dataproc cluster master and workers environments and gentropy will be installed via the install_dependencies pig job
  3. Run the gwas_curation_update dag

The result from the dag run is the gs://genetics_etl_python_playground/input/v2d/genetics_etl_python_playground/input/v2d/GWAS_Catalog_study_curation_2024-08-12.tsv file. I have checked the file and found that there are ~130 duplicated study IDs - GWAS_catalog_study_curation_cuplidates_2024-08-12.txt.

The duplicates come from published and unpublished summary statistics (from GWAS Catalog metadata files). Both have the same sumstats as published and not published.

After removal the complete list of study IDs have 45173 unique study Ids. The list can be found in gs://genetics_etl_python_playground/input/v2d/genetics_etl_python_playground/input/v2d/GWAS_Catalog_study_curation_dedup_2024-08-12.tsv. Also updated the curation list on google docs.

project-defiant commented 2 months ago

@xyg123 the prediction step was successful. You can check the results at gs://genetics_etl_python_playground/releases/24.08+szsz/

project-defiant commented 1 month ago

After result validation it was pointed by @xyg123 , that all of the the PICS credible sets have PosteriorProbability equal to 1. Genetics team tried to track the issue behind this PP distribution. Rerunning the Gwas_catalog_processing DAG fixed the issue.

Although new issue with Ecaviar colocalisation step was raised, when submitted recalculated credible sets.

project-defiant commented 1 month ago

After discussion with the genetics team, the release is postponed.