opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Load the latest version of the gentropy outputs #3569

Open d0choa opened 1 month ago

d0choa commented 1 month ago

This is a new iteration of the data, making #3567 obsolete:

❯ gsutil ls gs://ot_orchestration/releases/24.10_freeze2/
gs://ot_orchestration/releases/24.10_freeze2/locus_to_gene_gold_standard.json
gs://ot_orchestration/releases/24.10_freeze2/biosample_index/
gs://ot_orchestration/releases/24.10_freeze2/colocalisation/
gs://ot_orchestration/releases/24.10_freeze2/credible_set/
gs://ot_orchestration/releases/24.10_freeze2/gene_index/
gs://ot_orchestration/releases/24.10_freeze2/invalid_credible_set/
gs://ot_orchestration/releases/24.10_freeze2/invalid_study_index/
gs://ot_orchestration/releases/24.10_freeze2/locus_to_gene_feature_matrix/
gs://ot_orchestration/releases/24.10_freeze2/locus_to_gene_model/
gs://ot_orchestration/releases/24.10_freeze2/locus_to_gene_predictions/
gs://ot_orchestration/releases/24.10_freeze2/manifests/
gs://ot_orchestration/releases/24.10_freeze2/study_index/
gs://ot_orchestration/releases/24.10_freeze2/variant_index/
gs://ot_orchestration/releases/24.10_freeze2/variants/

A frozen description of all schemas can be found here.

The changes in the schemas, added to the ones described in #3567 are:

The full changelog of this dataset compared to 24.10_freeze1:

A probably incomplete list of things that we know are pending in this dataset from a semantic perspective:

This pipeline run was based on the gentropy szsz-nonmandatory-params-to-coloc-step (due to some adjustments required on the coloc priors) and do_etl_quick_iterations in the orchestration package. I will try to make the changes to dev in both repository as it gave us an almost perfect run. (@ireneisdoomed, it crashed feature matrix because of the broadcasting issue. I disabled auto-broadcast through configuration similar to train/predict and it worked well) Image

d0choa commented 1 month ago

This data has been superseded by : gs://ot_orchestration/releases/24.10_freeze3/ (generated by @project-defiant)

Changelog compared to 24.10_freeze2:

A probably incomplete list of things that we know are pending in this dataset from a semantic perspective:

Some finngen and eQTL catalogue credible sets contain duplicated variants inside the locus column PICS results have an issue in which variantId is not shown inside the locus object A fraction of QTLs (incl. iPS cells) are dropped due to EFO-based biosamples GWAS catalog study filtering based on curation + qc_sumstats hasSummaryStats field in the study index to be properly populated L2G evidence + any required adjustments Additional credible set validation (e.g. confirming PP sum 0.99-1) New GWAS Catalog SuSiE fine-mapping (pan-UKB LD reference, etc.)

@DSuveges will report some metrics to make sure data is within the expected ranges

project-defiant commented 1 month ago

@d0choa https://github.com/opentargets/orchestration/pull/41#event-14696106544 <- configuration update for the last run

prashantuniyal02 commented 1 month ago

24.10_freeze4

> gsutil ls gs://ot_orchestration/releases/24.10_freeze4/
gs://ot_orchestration/releases/24.10_freeze4/locus_to_gene_gold_standard.json
gs://ot_orchestration/releases/24.10_freeze4/biosample_index/
gs://ot_orchestration/releases/24.10_freeze4/colocalisation/
gs://ot_orchestration/releases/24.10_freeze4/credible_set/
gs://ot_orchestration/releases/24.10_freeze4/gene_index/
gs://ot_orchestration/releases/24.10_freeze4/invalid_credible_set/
gs://ot_orchestration/releases/24.10_freeze4/invalid_study_index/
gs://ot_orchestration/releases/24.10_freeze4/locus_to_gene_feature_matrix/
gs://ot_orchestration/releases/24.10_freeze4/locus_to_gene_model/
gs://ot_orchestration/releases/24.10_freeze4/locus_to_gene_predictions/
gs://ot_orchestration/releases/24.10_freeze4/manifests/
gs://ot_orchestration/releases/24.10_freeze4/study_index/
gs://ot_orchestration/releases/24.10_freeze4/variant_index/
gs://ot_orchestration/releases/24.10_freeze4/variants/

Image

d0choa commented 1 month ago

Changelog for 24.10_freeze4:

All the changes above should not structurally change the data, but they will fix some issues in the FE. I mentioned to @jdhayhurst that we have made a schema change in studyIndex regarding the sum stats QC columns. I can't remember if we picked up these changes for this data release but if you experience issues with study index this is why.

Known issues as 22nd of October:

project-defiant commented 1 month ago

@jdhayhurst The changes in the study index are refering to the sumstatQCValues key:value map we discussed.

addramir commented 3 weeks ago

Is it finsihed?

prashantuniyal02 commented 3 weeks ago

30/10: Will wait for 24.10_freeze6 for BE to load the data till Friday. If not, load freeze5

project-defiant commented 3 weeks ago

Changelog for 24.10_freeze5