opentargets / issues

Issue tracker for Open Targets Platform and Open Targets Genetics Portal
https://platform.opentargets.org https://genetics.opentargets.org
Apache License 2.0
12 stars 2 forks source link

Add genetics ETL step to generate association data #3599

Closed DSuveges closed 2 weeks ago

DSuveges commented 3 weeks ago

The genetics ETL aggregates l2g prediction and study data to build disease/target evidence. This dataset is picked up by the platform ETL to integrate with other evidence sources to build disease/target association. To evaluate the performance of l2g prediction it is desirable to work with the association data, however that requires a full ETL run, which makes l2g iteration very slow.

The aim of this issue is to add a step to gentropy, and submsequently add one more task to the ETL orchestration to build direct and indirect evidence dataset.

These two datasets needs to be saved as parquet files together with other ETL output. Important: this dataset is not ingested by the platform ETL.